{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T10:02:25Z","timestamp":1770458545203,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":52,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,12,2]],"date-time":"2021-12-02T00:00:00Z","timestamp":1638403200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CCF-1845706, IIS-1852606"],"award-info":[{"award-number":["CCF-1845706, IIS-1852606"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,12,6]]},"DOI":"10.1145\/3464298.3493391","type":"proceedings-article","created":{"date-parts":[[2021,12,2]],"date-time":"2021-12-02T23:39:52Z","timestamp":1638488392000},"page":"146-158","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["SwitchFlow"],"prefix":"10.1145","author":[{"given":"Xiaofeng","family":"Wu","sequence":"first","affiliation":[{"name":"The University of Texas at Arlington"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jia","family":"Rao","sequence":"additional","affiliation":[{"name":"The University of Texas at Arlington"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei","family":"Chen","sequence":"additional","affiliation":[{"name":"Nvidia Corporation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hang","family":"Huang","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chris","family":"Ding","sequence":"additional","affiliation":[{"name":"The University of Texas at Arlington"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heng","family":"Huang","sequence":"additional","affiliation":[{"name":"University of Pittsburgh"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,12,2]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Retrieved","year":"2021","unstructured":"2015. How to prevent tensorflow from allocating the totality of a GPU memory . Retrieved Jan 16, 2021 from https:\/\/stackoverflow.com\/questions\/34199233\/ 2015. How to prevent tensorflow from allocating the totality of a GPU memory. Retrieved Jan 16, 2021 from https:\/\/stackoverflow.com\/questions\/34199233\/"},{"key":"e_1_3_2_1_2_1","volume-title":"English-German WMT'16 Translation Task. Retrieved","year":"2021","unstructured":"2016. English-German WMT'16 Translation Task. Retrieved Jan 16, 2021 from http:\/\/www.statmt.org\/wmt16\/translation-task.html 2016. English-German WMT'16 Translation Task. Retrieved Jan 16, 2021 from http:\/\/www.statmt.org\/wmt16\/translation-task.html"},{"key":"e_1_3_2_1_3_1","volume-title":"Retrieved","year":"2021","unstructured":"2018. Tips to Improve Performance for Popular Deep Learning Frameworks on CPUs . Retrieved Jan 16, 2021 from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/tips-to-improve-performance-for-popular-deep-learning-frameworks-on-multi-core-cpus.htm 2018. Tips to Improve Performance for Popular Deep Learning Frameworks on CPUs. Retrieved Jan 16, 2021 from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/tips-to-improve-performance-for-popular-deep-learning-frameworks-on-multi-core-cpus.htm"},{"key":"e_1_3_2_1_4_1","volume-title":"Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , 2016 . Tensorflow: A System for Large-scale Machine Learning . In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI). Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A System for Large-scale Machine Learning. In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_3_2_1_5_1","volume-title":"The Case for GPGPU Spatial Multitasking. In In Prof. of IEEE IEEE International Symposium on High-Performance Computer Architecture (HPCA).","author":"Adriaens Jacob T","year":"2012","unstructured":"Jacob T Adriaens , Katherine Compton , Nam Sung Kim , and Michael J Schulte . 2012 . The Case for GPGPU Spatial Multitasking. In In Prof. of IEEE IEEE International Symposium on High-Performance Computer Architecture (HPCA). Jacob T Adriaens, Katherine Compton, Nam Sung Kim, and Michael J Schulte. 2012. The Case for GPGPU Spatial Multitasking. In In Prof. of IEEE IEEE International Symposium on High-Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330667"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173169"},{"key":"e_1_3_2_1_8_1","volume-title":"Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Bai Zhihao","year":"2020","unstructured":"Zhihao Bai , Zhen Zhang , Yibo Zhu , and Xin Jin . 2020 . PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications . In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI). Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. 2020. PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications. In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_3_2_1_9_1","unstructured":"Thomas Bradley. 2013. Hyper-Q. http:\/\/developer.download.nvidia.com\/compute\/DevZone\/C\/html_x64\/6_Advanced\/simpleHyperQ\/doc\/HyperQ.pdf.  Thomas Bradley. 2013. Hyper-Q. http:\/\/developer.download.nvidia.com\/compute\/DevZone\/C\/html_x64\/6_Advanced\/simpleHyperQ\/doc\/HyperQ.pdf."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-307-3.50012-5"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018743.3018748"},{"key":"e_1_3_2_1_12_1","unstructured":"Sharan Chetlur Cliff Woolley Philippe Vandermersch Jonathan Cohen John Tran Bryan Catanzaro and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. In arXiv preprint arXiv:1410.0759.  Sharan Chetlur Cliff Woolley Philippe Vandermersch Jonathan Cohen John Tran Bryan Catanzaro and Evan Shelhamer. 2014. cuDNN: Efficient Primitives for Deep Learning. In arXiv preprint arXiv:1410.0759."},{"key":"e_1_3_2_1_13_1","unstructured":"Dami Choi Alexandre Passos Christopher J Shallue and George E Dahl. 2019. Faster Neural Network Training with Data Echoing. In arXiv preprint arXiv:1907.05550.  Dami Choi Alexandre Passos Christopher J Shallue and George E Dahl. 2019. Faster Neural Network Training with Data Echoing. In arXiv preprint arXiv:1907.05550."},{"key":"e_1_3_2_1_14_1","volume-title":"Proc. of USENIX Symposium on Networked Systems Design and Implementation (NSDI).","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw , XinWang, Guilio Zhou , Michael J Franklin , Joseph E Gonzalez , and Ion Stoica . 2017 . Clipper: A Low-latency Online Prediction Serving System . In Proc. of USENIX Symposium on Networked Systems Design and Implementation (NSDI). Daniel Crankshaw, XinWang, Guilio Zhou, Michael J Franklin, Joseph E Gonzalez, and Ion Stoica. 2017. Clipper: A Low-latency Online Prediction Serving System. In Proc. of USENIX Symposium on Networked Systems Design and Implementation (NSDI)."},{"key":"e_1_3_2_1_15_1","volume-title":"Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In arXiv preprint arXiv:1810.04805.","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In arXiv preprint arXiv:1810.04805. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In arXiv preprint arXiv:1810.04805."},{"key":"e_1_3_2_1_16_1","volume-title":"Retrieved","year":"2020","unstructured":"Google. 2020 . Use a GPU in TensorFlow . Retrieved Jan 16, 2021 from https:\/\/www.tensorflow.org\/guide\/gpu Google. 2020. Use a GPU in TensorFlow. Retrieved Jan 16, 2021 from https:\/\/www.tensorflow.org\/guide\/gpu"},{"key":"e_1_3_2_1_17_1","volume-title":"Retrieved","year":"2021","unstructured":"Google. 2021 . TensorFlow Profiler . Retrieved Jan 16, 2021 from https:\/\/www.tensorflow.org\/guide\/profiler Google. 2021. TensorFlow Profiler. Retrieved Jan 16, 2021 from https:\/\/www.tensorflow.org\/guide\/profiler"},{"key":"e_1_3_2_1_18_1","volume-title":"Speech Recognition with Deep Recurrent Neural Networks. In In Proc. of International Conference on Acoustics, Speech and Signal Processing.","author":"Graves Alex","year":"2013","unstructured":"Alex Graves , Abdel-rahman Mohamed, and Geoffrey Hinton . 2013 . Speech Recognition with Deep Recurrent Neural Networks. In In Proc. of International Conference on Acoustics, Speech and Signal Processing. Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech Recognition with Deep Recurrent Neural Networks. In In Proc. of International Conference on Acoustics, Speech and Signal Processing."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274808.3274813"},{"key":"e_1_3_2_1_21_1","volume-title":"Proc. of Conference on Neural Information Processing Systems (NeurIPS).","author":"Jain Paras","year":"2011","unstructured":"Paras Jain , Xiangxi Mo , Ajay Jain , Harikaran Subbaraj , Rehan Sohail Durrani , Alexey Tumanov , Joseph Gonzalez , and Ion Stoica . 2011 . Dynamic Space-Time Scheduling for GPU Inference . In Proc. of Conference on Neural Information Processing Systems (NeurIPS). Paras Jain, Xiangxi Mo, Ajay Jain, Harikaran Subbaraj, Rehan Sohail Durrani, Alexey Tumanov, Joseph Gonzalez, and Ion Stoica. 2011. Dynamic Space-Time Scheduling for GPU Inference. In Proc. of Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"LeCun Yann and Bengio Yoshua and Hinton Geoffrey. 2015. Deep learning. In Nature.  LeCun Yann and Bengio Yoshua and Hinton Geoffrey. 2015. Deep learning. In Nature.","DOI":"10.1038\/nature14539"},{"key":"e_1_3_2_1_25_1","volume-title":"Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Lee Yunseong","year":"2018","unstructured":"Yunseong Lee , Alberto Scolari , Byung-Gon Chun , Marco Domenico Santambrogio , Markus Weimer , and Matteo Interlandi . 2018 . PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems . In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI). Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Markus Weimer, and Matteo Interlandi. 2018. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_3_2_1_26_1","volume-title":"Kyle Rupnow, Rick Siow Mong Goh, and Deming Chen.","author":"Liang Yun","year":"2014","unstructured":"Yun Liang , Huynh Phung Huynh , Kyle Rupnow, Rick Siow Mong Goh, and Deming Chen. 2014 . Efficient GPU Spatial-temporal Multitasking. In IEEE Transactions on Parallel and Distributed Systems . Yun Liang, Huynh Phung Huynh, Kyle Rupnow, Rick Siow Mong Goh, and Deming Chen. 2014. Efficient GPU Spatial-temporal Multitasking. In IEEE Transactions on Parallel and Distributed Systems."},{"key":"e_1_3_2_1_27_1","volume-title":"Proc. of ML Systems Workshop in NIPS.","author":"Meng Chen","year":"2017","unstructured":"Chen Meng , Minmin Sun , Jun Yang , Minghui Qiu , and Yang Gu . 2017 . Training Deeper Models by GPU Memory Optimization on TensorFlow . In Proc. of ML Systems Workshop in NIPS. Chen Meng, Minmin Sun, Jun Yang, Minghui Qiu, and Yang Gu. 2017. Training Deeper Models by GPU Memory Optimization on TensorFlow. In Proc. of ML Systems Workshop in NIPS."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-815480-9.00015-3"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Jayashree Mohan Amar Phanishayee Ashish Raniwala and Vijay Chidambaram. 2020. Analyzing and Mitigating Data Stalls in DNN Training. (2020).  Jayashree Mohan Amar Phanishayee Ashish Raniwala and Vijay Chidambaram. 2020. Analyzing and Mitigating Data Stalls in DNN Training. (2020).","DOI":"10.14778\/3446095.3446100"},{"key":"e_1_3_2_1_30_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2017","unstructured":"NVIDIA. 2017 . Maximizing Unified Memory Performance in CUDA . Retrieved Jan 16, 2021 from https:\/\/devblogs.nvidia.com\/maximizing-unified-memory-performance-cuda\/ NVIDIA. 2017. Maximizing Unified Memory Performance in CUDA. Retrieved Jan 16, 2021 from https:\/\/devblogs.nvidia.com\/maximizing-unified-memory-performance-cuda\/"},{"key":"e_1_3_2_1_31_1","volume-title":"NVIDIA Virtual GPU Software Documentation. Retrieved","author":"NVIDIA.","year":"2020","unstructured":"NVIDIA. 2018. NVIDIA Virtual GPU Software Documentation. Retrieved Nov 4, 2020 from https:\/\/docs.nvidia.com\/grid\/latest\/grid-vgpu-user-guide\/index.html NVIDIA. 2018. NVIDIA Virtual GPU Software Documentation. Retrieved Nov 4, 2020 from https:\/\/docs.nvidia.com\/grid\/latest\/grid-vgpu-user-guide\/index.html"},{"key":"e_1_3_2_1_32_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2019","unstructured":"NVIDIA. 2019 . GPUDirect Storage: A Direct Path Between Storage and GPU Memory . Retrieved Jan 16, 2021 from https:\/\/devblogs.nvidia.com\/gpudirect-storage\/ NVIDIA. 2019. GPUDirect Storage: A Direct Path Between Storage and GPU Memory. Retrieved Jan 16, 2021 from https:\/\/devblogs.nvidia.com\/gpudirect-storage\/"},{"key":"e_1_3_2_1_33_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2020","unstructured":"NVIDIA. 2020 . CUDA Basic Linear Algebra Subroutine library . Retrieved Jan 16, 2021 from https:\/\/docs.nvidia.com\/cuda\/cublas\/index.html NVIDIA. 2020. CUDA Basic Linear Algebra Subroutine library. Retrieved Jan 16, 2021 from https:\/\/docs.nvidia.com\/cuda\/cublas\/index.html"},{"key":"e_1_3_2_1_34_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2020","unstructured":"NVIDIA. 2020 . Multi-Process Service . Retrieved Jan 16, 2021 from https:\/\/docs.nvidia.com\/deploy\/mps\/index.html NVIDIA. 2020. Multi-Process Service. Retrieved Jan 16, 2021 from https:\/\/docs.nvidia.com\/deploy\/mps\/index.html"},{"key":"e_1_3_2_1_35_1","volume-title":"Retrieved","author":"NVIDIA.","year":"2020","unstructured":"NVIDIA. 2020 . The user manual for NVIDIA profiling tools for optimizing performance of CUDA applications . Retrieved Jan 16, 2021 from https:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/index.html NVIDIA. 2020. The user manual for NVIDIA profiling tools for optimizing performance of CUDA applications. Retrieved Jan 16, 2021 from https:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/index.html"},{"key":"e_1_3_2_1_36_1","unstructured":"NVIDIA. November 28 2019. CUDA Occupancy Calculator. https:\/\/docs.nvidia.com\/cuda\/cuda-occupancy-calculator\/index.html.  NVIDIA. November 28 2019. CUDA Occupancy Calculator. https:\/\/docs.nvidia.com\/cuda\/cuda-occupancy-calculator\/index.html."},{"key":"e_1_3_2_1_37_1","volume-title":"High-performance ML Serving. In In Proc. of Neural Information Processing Systems (NeurIPS)","author":"Olston Christopher","year":"2017","unstructured":"Christopher Olston , Noah Fiedel , Kiril Gorovoy , Jeremiah Harmsen , Li Lao , Fangwei Li , Vinu Rajashekhar , Sukriti Ramesh , and Jordan Soyke . 2017 . Tensorflow-Serving: Flexible , High-performance ML Serving. In In Proc. of Neural Information Processing Systems (NeurIPS) , Long Beach, CA, USA. Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, and Jordan Soyke. 2017. Tensorflow-Serving: Flexible, High-performance ML Serving. In In Proc. of Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2451116.2451160"},{"key":"e_1_3_2_1_39_1","volume-title":"Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).","author":"Kyu Park Jason Jong","year":"2015","unstructured":"Jason Jong Kyu Park , Yongjun Park , and Scott Mahlke . 2015 . Chimera: Collaborative preemption for multitasking on a shared GPU . In Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2015. Chimera: Collaborative preemption for multitasking on a shared GPU. In Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)."},{"key":"e_1_3_2_1_40_1","volume-title":"Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).","author":"Kyu Park Jason Jong","year":"2017","unstructured":"Jason Jong Kyu Park , Yongjun Park , and Scott Mahlke . 2017 . Dynamic Resource Management for Efficient Utilization of Multitasking GPUs . In Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke. 2017. Dynamic Resource Management for Efficient Utilization of Multitasking GPUs. In Proc. of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)."},{"key":"e_1_3_2_1_41_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic Differentiation in PyTorch.  Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic Differentiation in PyTorch."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359642"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783721"},{"key":"e_1_3_2_1_44_1","unstructured":"Sebastian Ruder. 2017. An Overview of Multi-task Learning in Deep Neural Networks. In arXiv preprint arXiv:1706.05098.  Sebastian Ruder. 2017. An Overview of Multi-task Learning in Deep Neural Networks. In arXiv preprint arXiv:1706.05098."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2014.6853208"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.21"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2011.5999803"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446078"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2751205.2751213"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037742"},{"key":"e_1_3_2_1_51_1","volume-title":"Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao , Romil Bhardwaj , Ramachandran Ramjee , Muthian Sivathanu , Nipun Kwatra , Zhenhua Han , Pratyush Patel , Xuan Peng , Hanyu Zhao , Quanlu Zhang , Fan Yang , and Lidong Zhou . 2018 . Gandiva: Introspective Cluster Scheduling for Deep Learning . In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI). Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. 2018. Gandiva: Introspective Cluster Scheduling for Deep Learning. In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_3_2_1_52_1","volume-title":"Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI).","author":"Xiao Wencong","year":"2020","unstructured":"Wencong Xiao , Shiru Ren , Yong Li , Yang Zhang , Pengyang Hou , Zhi Li , Yihui Feng , Wei Lin , and Yangqing Jia . 2020 . AntMan: Dynamic Scaling on GPU Clusters for Deep Learning . In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI). Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia. 2020. AntMan: Dynamic Scaling on GPU Clusters for Deep Learning. In Proc. of USENIX Symposium on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_3_2_1_53_1","volume-title":"SMGuard: A Flexible and Fine-grained Resource Management Framework for GPUs","author":"Yu Chao","unstructured":"Chao Yu , Yuebin Bai , Hailong Yang , Kun Cheng , Yuhao Gu , Zhongzhi Luan , and Depei Qian . 2018. SMGuard: A Flexible and Fine-grained Resource Management Framework for GPUs . In IEEE Transactions on Parallel and Distributed Systems . Chao Yu, Yuebin Bai, Hailong Yang, Kun Cheng, Yuhao Gu, Zhongzhi Luan, and Depei Qian. 2018. SMGuard: A Flexible and Fine-grained Resource Management Framework for GPUs. In IEEE Transactions on Parallel and Distributed Systems."}],"event":{"name":"Middleware '21: 22nd International Middleware Conference","location":"Qu\u00e9bec city Canada","acronym":"Middleware '21","sponsor":["ACM Association for Computing Machinery","USENIX Assoc USENIX Assoc","IFIP"]},"container-title":["Proceedings of the 22nd International Middleware Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3464298.3493391","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3464298.3493391","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3464298.3493391","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:15Z","timestamp":1750191135000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3464298.3493391"}},"subtitle":["preemptive multitasking for deep learning"],"short-title":[],"issued":{"date-parts":[[2021,12,2]]},"references-count":52,"alternative-id":["10.1145\/3464298.3493391","10.1145\/3464298"],"URL":"https:\/\/doi.org\/10.1145\/3464298.3493391","relation":{},"subject":[],"published":{"date-parts":[[2021,12,2]]},"assertion":[{"value":"2021-12-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}