{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T16:15:15Z","timestamp":1781885715159,"version":"3.54.5"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,4,22]],"date-time":"2024-04-22T00:00:00Z","timestamp":1713744000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100006374","name":"Danmarks Frie Forskningsfond","doi-asserted-by":"publisher","award":["0171-00061B"],"award-info":[{"award-number":["0171-00061B"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,4,22]]},"DOI":"10.1145\/3642970.3655827","type":"proceedings-article","created":{"date-parts":[[2024,4,19]],"date-time":"2024-04-19T10:46:57Z","timestamp":1713523617000},"page":"81-90","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["An Analysis of Collocation on GPUs for Deep Learning Training"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-3451-5602","authenticated-orcid":false,"given":"Ties","family":"Robroek","sequence":"first","affiliation":[{"name":"IT University of Copenhagen"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0156-1435","authenticated-orcid":false,"given":"Ehsan","family":"Yousefzadeh-Asl-Miandoab","sequence":"additional","affiliation":[{"name":"IT University of Copenhagen"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6838-4854","authenticated-orcid":false,"given":"P\u0131nar","family":"T\u00f6z\u00fcn","sequence":"additional","affiliation":[{"name":"IT University of Copenhagen"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,4,22]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"[n.d.]. GPU Pro Tip: CUDA 7 Streams Simplify Concurrency. https:\/\/developer.nvidia.com\/blog\/gpu-pro-tip-cuda-7-streams-simplify-concurrency\/. Accessed: 2022-10-21."},{"key":"e_1_3_2_1_2_1","volume-title":"Sebastian Benjamin Wrede, and Pinar T\u00f6z\u00fcn","author":"Baunsgaard Sebastian","year":"2020","unstructured":"Sebastian Baunsgaard, Sebastian Benjamin Wrede, and Pinar T\u00f6z\u00fcn. 2020. Training for Speech Recognition on Coprocessors. In ADMS."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926271"},{"key":"e_1_3_2_1_4_1","volume-title":"A Down-sampled Variant of ImageNet as an Alternative to the CIFAR datasets. CoRR arXiv","author":"Chrabaszcz Patryk","year":"2017","unstructured":"Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. 2017. A Down-sampled Variant of ImageNet as an Alternative to the CIFAR datasets. CoRR arXiv (2017)."},{"key":"e_1_3_2_1_5_1","unstructured":"Criteo. [n.d.]. Criteo 1TB Click Logs dataset. https:\/\/www.criteo.com\/news\/press-releases\/2015\/07\/criteo-releases-industrys-largest-ever-dataset\/."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00027"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3508036"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541941"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_10_1","volume-title":"Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI. 295--308.","author":"Hindman Benjamin","year":"2011","unstructured":"Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI. 295--308."},{"key":"e_1_3_2_1_11_1","volume-title":"Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Jeon Myeongjae","year":"2019","unstructured":"Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 947--960."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342276"},{"key":"e_1_3_2_1_13_1","unstructured":"Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto."},{"key":"e_1_3_2_1_14_1","volume-title":"Characterizing Multi-Instance GPU for Machine Learning Workloads. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 724--731","author":"Li Baolin","year":"2022","unstructured":"Baolin Li, Viiay Gadepally, Siddharth Samsi, and Devesh Tiwari. 2022. Characterizing Multi-Instance GPU for Machine Learning Workloads. In 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 724--731."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3542929.3563510"},{"key":"e_1_3_2_1_16_1","unstructured":"Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs\/1906.00091 (2019). https:\/\/arxiv.org\/abs\/1906.00091"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3337821.3337891"},{"key":"e_1_3_2_1_18_1","unstructured":"NVIDIA 2021. NVIDIA Multi-Instance GPU User Guide. NVIDIA. https:\/\/docs.nvidia.com\/datacenter\/tesla\/mig-user-guide\/."},{"key":"e_1_3_2_1_19_1","unstructured":"NVIDIA. 2022. Data Center GPU Manager Documentation. Technical Report. NVIDIA. https:\/\/docs.nvidia.com\/datacenter\/dcgm\/latest\/dcgm-user-guide\/."},{"key":"e_1_3_2_1_20_1","volume-title":"Technical Report","author":"Multi-Process Service NVIDIA.","unstructured":"NVIDIA. 2022. Multi-Process Service. Technical Report. NVIDIA Corporation. https:\/\/docs.nvidia.com\/deploy\/pdf\/CUDA_Multi_Process_Service_Overview.pdf"},{"key":"e_1_3_2_1_21_1","first-page":"8024","article-title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","volume":"32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024--8035.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_22_1","volume-title":"Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework (HPDC '11)","author":"Ravi Vignesh T.","unstructured":"Vignesh T. Ravi, Michela Becchi, Gagan Agrawal, and Srimat Chakradhar. 2011. Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework (HPDC '11). Association for Computing Machinery, 217--228."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3595360.3595851"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Ties Robroek Ehsan Yousefzadeh-Asl-Miandoab and Pinar T\u00f6z\u00fcn. 2023. An Analysis of Collocation on GPUs for Deep Learning Training. arXiv:2209.06018 [cs.LG]","DOI":"10.1145\/3642970.3655827"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_1_26_1","volume-title":"Le","author":"Tan Mingxing","year":"2021","unstructured":"Mingxing Tan and Quoc V. Le. 2021. EfficientNetV2: Smaller Models and Faster Training. CoRR abs\/2104.00298 (2021). arXiv:2104.00298 https:\/\/arxiv.org\/abs\/2104.00298"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00010"},{"key":"e_1_3_2_1_28_1","first-page":"599","article-title":"Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models","volume":"3","author":"Wang Shang","year":"2021","unstructured":"Shang Wang, Peiming Yang, Yuxuan Zheng, Xin Li, and Gennady Pekhimenko. 2021. Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models. Proceedings of Machine Learning and Systems 3 (2021), 599--623.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446078"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080203"},{"key":"e_1_3_2_1_31_1","volume-title":"MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)","author":"Weng Qizhen","year":"2022","unstructured":"Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding. 2022. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, 945--960. https:\/\/www.usenix.org\/conference\/nsdi22\/presentation\/weng"},{"key":"e_1_3_2_1_32_1","unstructured":"Ross Wightman. 2019. PyTorch Image Models. https:\/\/github.com\/rwightman\/pytorch-image-models."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001161"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3079202"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3578356.3592589"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3205289.3205311"}],"event":{"name":"EuroSys '24: Nineteenth European Conference on Computer Systems","location":"Athens Greece","acronym":"EuroSys '24","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the 4th Workshop on Machine Learning and Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3642970.3655827","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3642970.3655827","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T00:16:05Z","timestamp":1755908165000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3642970.3655827"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,22]]},"references-count":36,"alternative-id":["10.1145\/3642970.3655827","10.1145\/3642970"],"URL":"https:\/\/doi.org\/10.1145\/3642970.3655827","relation":{},"subject":[],"published":{"date-parts":[[2024,4,22]]},"assertion":[{"value":"2024-04-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}