{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:03:01Z","timestamp":1772906581034,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,11,1]],"date-time":"2021-11-01T00:00:00Z","timestamp":1635724800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,11]]},"DOI":"10.1145\/3472883.3486993","type":"proceedings-article","created":{"date-parts":[[2021,10,27]],"date-time":"2021-10-27T10:48:16Z","timestamp":1635331696000},"page":"624-638","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Scrooge"],"prefix":"10.1145","author":[{"given":"Yitao","family":"Hu","sequence":"first","affiliation":[{"name":"University of Southern California"}]},{"given":"Rajrup","family":"Ghosh","sequence":"additional","affiliation":[{"name":"University of Southern California"}]},{"given":"Ramesh","family":"Govindan","sequence":"additional","affiliation":[{"name":"University of Southern California"}]}],"member":"320","published-online":{"date-parts":[[2021,11]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"Amazon Web Services 2020. https:\/\/aws.amazon.com\/.  Amazon Web Services 2020. https:\/\/aws.amazon.com\/."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.29469"},{"key":"e_1_3_2_2_3_1","volume-title":"OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields","author":"Cao Zhe","year":"2019","unstructured":"Zhe Cao , Gines Hidalgo , Tomas Simon , Shih-En Wei , and Yaser Sheikh . 2019. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields . IEEE transactions on pattern analysis and machine intelligence 43, 1 ( 2019 ), 172--186. Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2019. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE transactions on pattern analysis and machine intelligence 43, 1 (2019), 172--186."},{"key":"e_1_3_2_2_4_1","volume-title":"Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . 2015 . Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015). Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 (2015)."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421285"},{"key":"e_1_3_2_2_6_1","volume-title":"Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17)","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw , Xin Wang , Guilio Zhou , Michael J. Franklin , Joseph E. Gonzalez , and Ion Stoica . 2017 . Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17) . USENIX Association, Boston, MA, 613--627. https:\/\/www.usenix.org\/conference\/nsdi17\/technical-sessions\/presentation\/crankshaw Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 613--627. https:\/\/www.usenix.org\/conference\/nsdi17\/technical-sessions\/presentation\/crankshaw"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_2_2_8_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_2_9_1","unstructured":"EarthCam 2021. https:\/\/www.earthcam.com\/.  EarthCam 2021. https:\/\/www.earthcam.com\/."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_33"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190541"},{"key":"e_1_3_2_2_12_1","unstructured":"Google Cloud Platform 2020. https:\/\/cloud.google.com\/.  Google Cloud Platform 2020. https:\/\/cloud.google.com\/."},{"key":"e_1_3_2_2_13_1","volume-title":"CUDA 7 Streams Simplify Concurrency","author":"Pro Tip GPU","year":"2021","unstructured":"GPU Pro Tip : CUDA 7 Streams Simplify Concurrency 2021 . https:\/\/developer.nvidia.com\/blog\/gpu-pro-tip-cuda-7-streams-simplify-concurrency\/. GPU Pro Tip: CUDA 7 Streams Simplify Concurrency 2021. https:\/\/developer.nvidia.com\/blog\/gpu-pro-tip-cuda-7-streams-simplify-concurrency\/."},{"key":"e_1_3_2_2_14_1","unstructured":"Mark Harris. 2013. CUDA Pro Tip: Understand Fat Binaries and JIT Caching. https:\/\/devblogs.nvidia.com\/cuda-pro-tip-understand-fat-binaries-jit-caching\/.  Mark Harris. 2013. CUDA Pro Tip: Understand Fat Binaries and JIT Caching. https:\/\/devblogs.nvidia.com\/cuda-pro-tip-understand-fat-binaries-jit-caching\/."},{"key":"e_1_3_2_2_15_1","unstructured":"Alexander Hermans Lucas Beyer and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).  Alexander Hermans Lucas Beyer and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017)."},{"key":"e_1_3_2_2_16_1","volume-title":"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs\/1704.04861","author":"Howard Andrew G.","year":"2017","unstructured":"Andrew G. Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , and Hartwig Adam . 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs\/1704.04861 ( 2017 ). arXiv:1704.04861 http:\/\/arxiv.org\/abs\/1704.04861 Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs\/1704.04861 (2017). arXiv:1704.04861 http:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_2_2_17_1","volume-title":"Focus: Querying Large Video Datasets with Low Latency and Low Cost. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Hsieh Kevin","year":"2018","unstructured":"Kevin Hsieh , Ganesh Ananthanarayanan , Peter Bodik , Shivaram Venkataraman , Paramvir Bahl , Matthai Philipose , Phillip B. Gibbons , and Onur Mutlu . 2018 . Focus: Querying Large Video Datasets with Low Latency and Low Cost. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . USENIX Association, Carlsbad, CA, 269--286. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/hsieh Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Shivaram Venkataraman, Paramvir Bahl, Matthai Philipose, Phillip B. Gibbons, and Onur Mutlu. 2018. Focus: Querying Large Video Datasets with Low Latency and Low Cost. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 269--286. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/hsieh"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450268.3453521"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.351"},{"key":"e_1_3_2_2_20_1","volume-title":"The Next Step in GPU-Accelerated Deep Learning","author":"Inference","year":"2015","unstructured":"Inference : The Next Step in GPU-Accelerated Deep Learning 2015 . https:\/\/developer.nvidia.com\/blog\/inference-next-step-gpu-accelerated-deep-learning\/. Inference: The Next Step in GPU-Accelerated Deep Learning 2015. https:\/\/developer.nvidia.com\/blog\/inference-next-step-gpu-accelerated-deep-learning\/."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272996.1273005"},{"key":"e_1_3_2_2_22_1","volume-title":"Alexey Tumanov, Joseph Gonzalez, and Ion Stoica.","author":"Jain Paras","year":"2018","unstructured":"Paras Jain , Xiangxi Mo , Ajay Jain , Harikaran Subbaraj , Rehan Sohail Durrani , Alexey Tumanov, Joseph Gonzalez, and Ion Stoica. 2018 . Dynamic Space-Time Scheduling for GPU Inference . arXiv preprint arXiv:1901.00041 (2018). Paras Jain, Xiangxi Mo, Ajay Jain, Harikaran Subbaraj, Rehan Sohail Durrani, Alexey Tumanov, Joseph Gonzalez, and Ion Stoica. 2018. Dynamic Space-Time Scheduling for GPU Inference. arXiv preprint arXiv:1901.00041 (2018)."},{"key":"e_1_3_2_2_23_1","volume-title":"The OoO VLIW JIT Compiler for GPU Inference. arXiv preprint arXiv:1901.10008","author":"Jain Paras","year":"2019","unstructured":"Paras Jain , Xiangxi Mo , Ajay Jain , Alexey Tumanov , Joseph E Gonzalez , and Ion Stoica . 2019. The OoO VLIW JIT Compiler for GPU Inference. arXiv preprint arXiv:1901.10008 ( 2019 ). Paras Jain, Xiangxi Mo, Ajay Jain, Alexey Tumanov, Joseph E Gonzalez, and Ion Stoica. 2019. The OoO VLIW JIT Compiler for GPU Inference. arXiv preprint arXiv:1901.10008 (2019)."},{"key":"e_1_3_2_2_24_1","volume-title":"13th { USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 783--798.","author":"Kalavri Vasiliki","unstructured":"Vasiliki Kalavri , John Liagouris , Moritz Hoffmann , Desislava Dimitrova , Matthew Forshaw , and Timothy Roscoe . 2018. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows . In 13th { USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 783--798. Vasiliki Kalavri, John Liagouris, Moritz Hoffmann, Desislava Dimitrova, Matthew Forshaw, and Timothy Roscoe. 2018. Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. In 13th { USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 783--798."},{"key":"e_1_3_2_2_25_1","volume-title":"15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18)","author":"Katsikas Georgios P.","unstructured":"Georgios P. Katsikas , Tom Barbette , Dejan Kosti\u0107 , Rebecca Steinert , and Gerald Q . Maguire Jr. 2018. Metron: NFV Service Chains at the True Speed of the Underlying Hardware . In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) . USENIX Association, Renton, WA, 171--186. https:\/\/www.usenix.org\/conference\/nsdi18\/presentation\/katsikas Georgios P. Katsikas, Tom Barbette, Dejan Kosti\u0107, Rebecca Steinert, and Gerald Q. Maguire Jr. 2018. Metron: NFV Service Chains at the True Speed of the Underlying Hardware. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 171--186. https:\/\/www.usenix.org\/conference\/nsdi18\/presentation\/katsikas"},{"key":"e_1_3_2_2_26_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in neural information processing systems. 1097--1105.  Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in neural information processing systems. 1097--1105."},{"key":"e_1_3_2_2_27_1","volume-title":"Deep learning. nature 521, 7553","author":"LeCun Yann","year":"2015","unstructured":"Yann LeCun , Yoshua Bengio , and Geoffrey Hinton . 2015. Deep learning. nature 521, 7553 ( 2015 ), 436--444. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436--444."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3356250.3360041"},{"key":"e_1_3_2_2_29_1","unstructured":"Gurobi Optimization LLC. 2020. Gurobi Optimizer Reference Manual. http:\/\/www.gurobi.com.  Gurobi Optimization LLC. 2020. Gurobi Optimizer Reference Manual. http:\/\/www.gurobi.com."},{"key":"e_1_3_2_2_30_1","unstructured":"Microsoft Azure 2020. https:\/\/azure.microsoft.com\/en-us\/.  Microsoft Azure 2020. https:\/\/azure.microsoft.com\/en-us\/."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522738"},{"key":"e_1_3_2_2_32_1","unstructured":"NVIDIA System Management Interface 2012. https:\/\/developer.nvidia.com\/nvidia-system-management-interface.  NVIDIA System Management Interface 2012. https:\/\/developer.nvidia.com\/nvidia-system-management-interface."},{"key":"e_1_3_2_2_33_1","unstructured":"NVIDIA Triton Inference Server 2021. https:\/\/developer.nvidia.com\/nvidia-triton-inference- server.  NVIDIA Triton Inference Server 2021. https:\/\/developer.nvidia.com\/nvidia-triton-inference- server."},{"key":"e_1_3_2_2_34_1","unstructured":"NVIDIA's TensorRT 2019. https:\/\/developer.nvidia.com\/tensorrt.  NVIDIA's TensorRT 2019. https:\/\/developer.nvidia.com\/tensorrt."},{"key":"e_1_3_2_2_35_1","unstructured":"ONNX 2021. https:\/\/onnx.ai\/.  ONNX 2021. https:\/\/onnx.ai\/."},{"key":"e_1_3_2_2_36_1","volume-title":"12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Panda Aurojit","year":"2016","unstructured":"Aurojit Panda , Sangjin Han , Keon Jang , Melvin Walls , Sylvia Ratnasamy , and Scott Shenker . 2016 . NetBricks: Taking the V out of NFV . In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) . USENIX Association, Savannah, GA, 203--216. https:\/\/www.usenix.org\/conference\/osdi16\/technical-sessions\/presentation\/panda Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, and Scott Shenker. 2016. NetBricks: Taking the V out of NFV. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 203--216. https:\/\/www.usenix.org\/conference\/osdi16\/technical-sessions\/presentation\/panda"},{"key":"e_1_3_2_2_37_1","unstructured":"PyTorch 2021. https:\/\/pytorch.org\/.  PyTorch 2021. https:\/\/pytorch.org\/."},{"key":"e_1_3_2_2_38_1","unstructured":"Realtime action recognition 2019. https:\/\/github.com\/felixchenfy\/Realtime-Action-Recognition.  Realtime action recognition 2019. https:\/\/github.com\/felixchenfy\/Realtime-Action-Recognition."},{"key":"e_1_3_2_2_39_1","volume-title":"YOLOv3: an Incremental Improvement. arXiv","author":"Redmon Joseph","year":"2018","unstructured":"Joseph Redmon and Ali Farhadi . 2018. YOLOv3: an Incremental Improvement. arXiv ( 2018 ). Joseph Redmon and Ali Farhadi. 2018. YOLOv3: an Incremental Improvement. arXiv (2018)."},{"key":"e_1_3_2_2_40_1","volume-title":"INFaaS: A model-less inference serving system. arXiv preprint arXiv:1905.13348","author":"Romero Francisco","year":"2019","unstructured":"Francisco Romero , Qian Li , Neeraja J Yadwadkar , and Christos Kozyrakis . 2019. INFaaS: A model-less inference serving system. arXiv preprint arXiv:1905.13348 ( 2019 ). Francisco Romero, Qian Li, Neeraja J Yadwadkar, and Christos Kozyrakis. 2019. INFaaS: A model-less inference serving system. arXiv preprint arXiv:1905.13348 (2019)."},{"key":"e_1_3_2_2_41_1","volume-title":"Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines. arXiv preprint arXiv:2102.01887","author":"Romero Francisco","year":"2021","unstructured":"Francisco Romero , Mark Zhao , Neeraja J Yadwadkar , and Christos Kozyrakis . 2021 . Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines. arXiv preprint arXiv:2102.01887 (2021). Francisco Romero, Mark Zhao, Neeraja J Yadwadkar, and Christos Kozyrakis. 2021. Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines. arXiv preprint arXiv:2102.01887 (2021)."},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359658"},{"key":"e_1_3_2_2_43_1","volume-title":"Inception-Resnet and the Impact of Residual Connections on Learning. In Thirty-First AAAI Conference on Artificial Intelligence.","author":"Szegedy Christian","year":"2017","unstructured":"Christian Szegedy , Sergey Ioffe , Vincent Vanhoucke , and Alexander A Alemi . 2017 . Inception-v4 , Inception-Resnet and the Impact of Residual Connections on Learning. In Thirty-First AAAI Conference on Artificial Intelligence. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, Inception-Resnet and the Impact of Residual Connections on Learning. In Thirty-First AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_2_2_44_1","unstructured":"TensorFlow 2021. https:\/\/www.tensorflow.org\/.  TensorFlow 2021. https:\/\/www.tensorflow.org\/."},{"key":"e_1_3_2_2_45_1","unstructured":"TensorFlow Serving 2021. https:\/\/github.com\/tensorflow\/serving.  TensorFlow Serving 2021. https:\/\/github.com\/tensorflow\/serving."},{"key":"e_1_3_2_2_46_1","unstructured":"The Private Life of MP3 Frames 2021. http:\/\/id3lib.sourceforge.net\/id3\/mp3frame.html.  The Private Life of MP3 Frames 2021. http:\/\/id3lib.sourceforge.net\/id3\/mp3frame.html."},{"key":"e_1_3_2_2_47_1","unstructured":"TorchServe 2021. https:\/\/pytorch.org\/serve\/.  TorchServe 2021. https:\/\/pytorch.org\/serve\/."},{"key":"e_1_3_2_2_48_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349318"},{"key":"e_1_3_2_2_51_1","volume-title":"Elastic Scaling of Stateful Network Functions. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18)","author":"Woo Shinae","year":"2018","unstructured":"Shinae Woo , Justine Sherry , Sangjin Han , Sue Moon , Sylvia Ratnasamy , and Scott Shenker . 2018 . Elastic Scaling of Stateful Network Functions. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18) . USENIX Association, Renton, WA, 299--312. https:\/\/www.usenix.org\/conference\/nsdi18\/presentation\/woo Shinae Woo, Justine Sherry, Sangjin Han, Sue Moon, Sylvia Ratnasamy, and Scott Shenker. 2018. Elastic Scaling of Stateful Network Functions. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 299--312. https:\/\/www.usenix.org\/conference\/nsdi18\/presentation\/woo"},{"key":"e_1_3_2_2_52_1","volume-title":"Gandiva: Introspective Cluster Scheduling for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao , Romil Bhardwaj , Ramachandran Ramjee , Muthian Sivathanu , Nipun Kwatra , Zhenhua Han , Pratyush Patel , Xuan Peng , Hanyu Zhao , Quanlu Zhang , Fan Yang , and Lidong Zhou . 2018 . Gandiva: Introspective Cluster Scheduling for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) . USENIX Association, Carlsbad, CA, 595--610. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/xiao Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. 2018. Gandiva: Introspective Cluster Scheduling for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 595--610. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/xiao"},{"key":"e_1_3_2_2_53_1","first-page":"10","article-title":"Spark: Cluster computing with working sets","volume":"10","author":"Zaharia Matei","year":"2010","unstructured":"Matei Zaharia , Mosharaf Chowdhury , Michael J Franklin , Scott Shenker , Ion Stoica , 2010 . Spark: Cluster computing with working sets . HotCloud 10 , 10 - 10 (2010), 95. Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10-10 (2010), 95.","journal-title":"HotCloud"},{"key":"e_1_3_2_2_54_1","volume-title":"Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 {USENIX} Annual Technical Conference ({USENIX} {ATC } 19). 1049--1062.","author":"Zhang Chengliang","year":"2019","unstructured":"Chengliang Zhang , Minchen Yu , Wei Wang , and Feng Yan . 2019 . Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 {USENIX} Annual Technical Conference ({USENIX} {ATC } 19). 1049--1062. Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In 2019 {USENIX} Annual Technical Conference ({USENIX} {ATC } 19). 1049--1062."},{"key":"e_1_3_2_2_55_1","volume-title":"14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 377--392.","author":"Zhang Haoyu","unstructured":"Haoyu Zhang , Ganesh Ananthanarayanan , Peter Bodik , Matthai Philipose , Paramvir Bahl , and Michael J Freedman . 2017. Live video analytics at scale with approximation and delay-tolerance . In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 377--392. Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J Freedman. 2017. Live video analytics at scale with approximation and delay-tolerance. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 377--392."}],"event":{"name":"SoCC '21: ACM Symposium on Cloud Computing","location":"Seattle WA USA","acronym":"SoCC '21","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the ACM Symposium on Cloud Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472883.3486993","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472883.3486993","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:57Z","timestamp":1750191117000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472883.3486993"}},"subtitle":["A Cost-Effective Deep Learning Inference System"],"short-title":[],"issued":{"date-parts":[[2021,11]]},"references-count":55,"alternative-id":["10.1145\/3472883.3486993","10.1145\/3472883"],"URL":"https:\/\/doi.org\/10.1145\/3472883.3486993","relation":{},"subject":[],"published":{"date-parts":[[2021,11]]},"assertion":[{"value":"2021-11-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}