{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T20:56:30Z","timestamp":1773867390525,"version":"3.50.1"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,21]],"date-time":"2025-03-21T00:00:00Z","timestamp":1742515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2023YFB3001504"],"award-info":[{"award-number":["2023YFB3001504"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62302302, 62232011"],"award-info":[{"award-number":["62302302, 62232011"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Natural Science Foundation of Shanghai Municipality","award":["24ZR1430500"],"award-info":[{"award-number":["24ZR1430500"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>\n            Job packing is an effective technique to harvest the idle resources allocated to the deep learning (DL) training jobs but not fully utilized, especially when clusters may experience low utilization, and users may overestimate their resource needs. However, existing job packing techniques tend to be conservative due to the mismatch in scope and granularity between job packing and cluster scheduling. In particular, tapping the potential of job packing in the training cluster requires a local and fine-grained coordination mechanism. To this end, we propose a novel job-packing middleware named\n            <jats:sc>Gimbal<\/jats:sc>\n            , which operates between the cluster scheduler and the hardware resources. As middleware,\n            <jats:sc>Gimbal<\/jats:sc>\n            must not only facilitate coordination among the packed jobs but also support various scheduling objectives of different schedulers.\n            <jats:sc>Gimbal<\/jats:sc>\n            achieves dual functionality by introducing a set of worker calibration primitives designed to calibrate workers\u2019 execution status in a fine-grained manner. The primitives obscure the complexity of the underlying job and resource management mechanisms, thus offering the generality and extensibility for crafting coordination policies tailored to various scheduling objectives. We implement\n            <jats:sc>Gimbal<\/jats:sc>\n            on a real-world GPU cluster and evaluate it with a set of representative DL training jobs. The results show that\n            <jats:sc>Gimbal<\/jats:sc>\n            improves different scheduling objectives up to 1.32\u00d7 compared with the state-of-the-art job packing techniques.\n          <\/jats:p>","DOI":"10.1145\/3711927","type":"journal-article","created":{"date-parts":[[2025,1,13]],"date-time":"2025-01-13T11:34:46Z","timestamp":1736768086000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Taming Flexible Job Packing in Deep Learning Training Clusters"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-6225-2139","authenticated-orcid":false,"given":"Pengyu","family":"Yang","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6646-5260","authenticated-orcid":false,"given":"Weihao","family":"Cui","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China and National University of Singapore, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-9272-1732","authenticated-orcid":false,"given":"Chunyu","family":"Xue","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1561-5329","authenticated-orcid":false,"given":"Han","family":"Zhao","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9480-5632","authenticated-orcid":false,"given":"Chen","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5832-0347","authenticated-orcid":false,"given":"Quan","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-8456-2420","authenticated-orcid":false,"given":"Jing","family":"Yang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China and State Key Laboratory of Public Big Data, Guizhou University, Guiyang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0034-2302","authenticated-orcid":false,"given":"Minyi","family":"Guo","sequence":"additional","affiliation":[{"name":"Computer Science, Shanghai Jiao Tong University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2025,3,21]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"2017. Nvidia Volta Architecture. Retrieved June 30 2024 from https:\/\/www.nvidia.com\/en-us\/data-center\/volta-gpu-architecture\/. (2017)."},{"key":"e_1_3_1_3_2","unstructured":"2019. Pegasus: Makes the Work Flow. Retrieved June 30 2024 from https:\/\/pegasus.isi.edu\/. (2019)."},{"key":"e_1_3_1_4_2","unstructured":"2020. Nvidia Ampere Architecture. Retrieved June 30 2024 from https:\/\/www.nvidia.com\/en-us\/data-center\/ampere-architecture\/. (2020)."},{"key":"e_1_3_1_5_2","unstructured":"2023. gRPC: An RPC library and framework. Retrieved June 30 2024 from https:\/\/grpc.io. (2023)."},{"key":"e_1_3_1_6_2","unstructured":"2024. Slurm workload manager. Retrieved June 30 2024 from https:\/\/slurm.schedmd.com\/documentation.html. (2024)."},{"key":"e_1_3_1_7_2","first-page":"265","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et\u00a0al. 2016. \\(\\lbrace\\) TensorFlow \\(\\rbrace\\) : A system for \\(\\lbrace\\) large-scale \\(\\rbrace\\) machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. 265\u2013283."},{"key":"e_1_3_1_8_2","first-page":"173","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Amodei Dario","year":"2016","unstructured":"Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et\u00a0al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In Proceedings of the International Conference on Machine Learning. PMLR, 173\u2013182."},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","unstructured":"Zhengda Bian Shenggui Li Wei Wang and Yang You. 2021. Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters.In Proceedings of the International Conference for High Performance Computing Networking Storage and Analysis. Association for Computing Machinery New York NY USA Article 100 15 pages. DOI:10.1145\/3458817.3480859","DOI":"10.1145\/3458817.3480859"},{"key":"e_1_3_1_10_2","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D. Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et\u00a0al. 2020. Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver BC Canada) (NIPS\u201920). Curran Associates Inc. Red Hook NY USA Article 159 25 pages."},{"key":"e_1_3_1_11_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Cai Han","year":"2020","unstructured":"Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once for All: Train one network and specialize it for efficient deployment. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/arxiv.org\/pdf\/1908.09791.pdf"},{"key":"e_1_3_1_12_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Cai Han","year":"2019","unstructured":"Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct neural architecture search on target task and hardware. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/arxiv.org\/pdf\/1812.00332.pdf"},{"key":"e_1_3_1_13_2","first-page":"1","volume-title":"Proceedings of the 15th European Conference on Computer Systems","author":"Chaudhary Shubham","year":"2020","unstructured":"Shubham Chaudhary, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, and Srinidhi Viswanatha. 2020. Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning. In Proceedings of the 15th European Conference on Computer Systems. 1\u201316."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3419111.3421299"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/2954679.2872368"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304005"},{"key":"e_1_3_1_17_2","unstructured":"NVIDIA Corporation. 2023. NVIDIA Multi-Instance GPU User Guide. (2023). Retrieved June 30 2024 from https:\/\/docs.nvidia.com\/datacenter\/tesla\/mig-user-guide\/index.html"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476143"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_20_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575721"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00140"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575705"},{"key":"e_1_3_1_25_2","first-page":"19","volume-title":"Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation","author":"Jajoo Akshay","year":"2022","unstructured":"Akshay Jajoo, Y. Charlie Hu, Xiaojun Lin, and Nan Deng. 2022. A case for task sampling based learning for cluster job scheduling. In Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation. USENIX Association, Renton, WA, 19\u201333. Retrieved from https:\/\/www.usenix.org\/conference\/nsdi22\/presentation\/jajoo"},{"key":"e_1_3_1_26_2","doi-asserted-by":"crossref","first-page":"642","DOI":"10.1145\/3600006.3613175","volume-title":"Proceedings of the 29th Symposium on Operating Systems Principles","author":"Subramanya Suhas Jayaram","year":"2023","unstructured":"Suhas Jayaram Subramanya, Daiyaan Arfeen, Shouxu Lin, Aurick Qiao, Zhihao Jia, and Gregory R. Ganger. 2023. Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling. In Proceedings of the 29th Symposium on Operating Systems Principles. 642\u2013657."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.5555\/3358807.3358888"},{"key":"e_1_3_1_28_2","unstructured":"Alex Krizhevsky. 2023. The CIFAR10 Dataset. Retrieved June 30 2024 from https:\/\/cs.toronto.edu\/kriz\/cifar.html. (2023)."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607104"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2749475"},{"key":"e_1_3_1_31_2","first-page":"937","volume-title":"Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation","author":"Mai Luo","year":"2020","unstructured":"Luo Mai, Guo Li, Marcel Wagenl\u00e4nder, Konstantinos Fertakis, Andrei-Octavian Brabete, and Peter Pietzuch. 2020. KungFu: Making training in distributed machine learning adaptive. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 937\u2013954. Retrieved from https:\/\/www.usenix.org\/conference\/osdi20\/presentation\/mai"},{"key":"e_1_3_1_32_2","unstructured":"Meta. 2024. Building Meta\u2019s GenAI infrastructure. (2024). Retrieved June 23 2024 from https:\/\/engineering.fb.com\/2024\/03\/12\/data-center-engineering\/building-metas-genai-infrastructure\/"},{"key":"e_1_3_1_33_2","first-page":"579","volume-title":"Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation","author":"Mohan Jayashree","year":"2022","unstructured":"Jayashree Mohan, Amar Phanishayee, Janardhan Kulkarni, and Vijay Chidambaram. 2022. Looking beyond GPUs for DNN scheduling on multi-tenant clusters. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation. 579\u2013596."},{"key":"e_1_3_1_34_2","first-page":"579","volume-title":"Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation","author":"Mohan Jayashree","year":"2022","unstructured":"Jayashree Mohan, Amar Phanishayee, Janardhan Kulkarni, and Vijay Chidambaram. 2022. Looking beyond GPUs for DNN scheduling on multi-tenant clusters. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, Carlsbad, CA, 579\u2013596. Retrieved from https:\/\/www.usenix.org\/conference\/osdi22\/presentation\/mohan"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.5555\/3488766.3488793"},{"key":"e_1_3_1_36_2","unstructured":"NVIDIA Corporation. 2023. NVIDIA Multi-Process Service Documentation. (2023). Retrieved June 23 2024 from https:\/\/docs.nvidia.com\/deploy\/mps\/Accessed: 2024-06-23."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","unstructured":"Jason Ansel Edward Yang Horace He Natalia Gimelshein Animesh Jain Michael Voznesensky Bin Bao Peter Bell David Berard Evgeni Burovski et\u00a0al. 2024. PyTorch 2: Faster machine learning through dynami Python Bytecode transformation and graph compilation. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems Volume 2 (La Jolla CA USA) (ASPLOS\u201924). Association for Computing Machinery New York NY USA 929\u2013947. DOI:10.1145\/3620665.3640366","DOI":"10.1145\/3620665.3640366"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00025"},{"key":"e_1_3_1_40_2","unstructured":"Princeton University. 2024. Princeton invests in new 300-GPU cluster for academic AI research. (2024). Retrieved October 12 2023 from https:\/\/ai.princeton.edu\/news\/2024\/princeton-invests-new-300-gpu-cluster-academic-ai-research"},{"key":"e_1_3_1_41_2","first-page":"1","volume-title":"Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation","author":"Qiao Aurick","year":"2021","unstructured":"Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. 2021. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation. 1\u201318."},{"key":"e_1_3_1_42_2","doi-asserted-by":"crossref","unstructured":"Pranav Rajpurkar Robin Jia and Percy Liang. 2018. Know what you don\u2019t know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018). Accessed: 2024-06-30.","DOI":"10.18653\/v1\/P18-2124"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_44_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_1_45_2","first-page":"6105","volume-title":"Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research)","volume":"97","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research).Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97, PMLR, 6105\u20136114. Retrieved from https:\/\/proceedings.mlr.press\/v97\/tan19a.html"},{"key":"e_1_3_1_46_2","unstructured":"Rohan Taori Ishaan Gulrajani Tianyi Zhang Yann Dubois Xuechen Li Carlos Guestrin Percy Liang and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. Retrieved June 30 2024 from https:\/\/github.com\/tatsu-lab\/stanford_alpaca. (2023)."},{"key":"e_1_3_1_47_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et\u00a0al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved June 30 2024 from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_1_48_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et\u00a0al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved June 30 2024 from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_1_49_2","volume-title":"Proceedings of the 19th  \\(\\lbrace\\) USENIX \\(\\rbrace\\)  Symposium on Networked Systems Design and Implementation","author":"Weng Qizhen","year":"2022","unstructured":"Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding. 2022. MLaaS in the wild: Workload analysis and scheduling in large-scale heterogeneous GPU clusters. In Proceedings of the 19th \\(\\lbrace\\) USENIX \\(\\rbrace\\) Symposium on Networked Systems Design and Implementation."},{"key":"e_1_3_1_50_2","first-page":"995","volume-title":"Proceedings of the 2023 USENIX Annual Technical Conference","author":"Weng Qizhen","year":"2023","unstructured":"Qizhen Weng, Lingyun Yang, Yinghao Yu, Wei Wang, Xiaochuan Tang, Guodong Yang, and Liping Zhang. 2023. Beware of fragmentation: Scheduling GPU-sharing workloads with fragmentation gradient descent. In Proceedings of the 2023 USENIX Annual Technical Conference. USENIX Association, Boston, MA, 995\u20131008. Retrieved from https:\/\/www.usenix.org\/conference\/atc23\/presentation\/weng"},{"key":"e_1_3_1_51_2","first-page":"595","volume-title":"Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, et\u00a0al. 2018. Gandiva: Introspective cluster scheduling for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation. 595\u2013610."},{"key":"e_1_3_1_52_2","first-page":"533","volume-title":"Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation","author":"Xiao Wencong","year":"2020","unstructured":"Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia. 2020. AntMan: Dynamic scaling on GPU clusters for deep learning. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. 533\u2013548."},{"issue":"1","key":"e_1_3_1_53_2","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1109\/TPDS.2021.3079202","article-title":"Horus: Interference-aware and prediction-based scheduling in deep learning systems","volume":"33","author":"Yeung Gingfung","year":"2021","unstructured":"Gingfung Yeung, Damian Borowiec, Renyu Yang, Adrian Friday, Richard Harper, and Peter Garraghan. 2021. Horus: Interference-aware and prediction-based scheduling in deep learning systems. IEEE Transactions on Parallel and Distributed Systems 33, 1 (2021), 88\u2013100.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_1_54_2","first-page":"217","volume-title":"Proceedings of the 2022 USENIX Annual Technical Conference","author":"Zhang Wei","year":"2022","unstructured":"Wei Zhang, Binghao Chen, Zhenhua Han, Quan Chen, Peng Cheng, Fan Yang, Ran Shu, Yuqing Yang, and Minyi Guo. 2022. PilotFish: Harvesting free cycles of cloud gaming with deep learning training. In Proceedings of the 2022 USENIX Annual Technical Conference. USENIX Association, Carlsbad, CA, 217\u2013232. Retrieved from https:\/\/www.usenix.org\/conference\/atc22\/presentation\/zhang-wei"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3552326.3567499"},{"key":"e_1_3_1_56_2","first-page":"515","volume-title":"Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation","author":"Zhao Hanyu","year":"2020","unstructured":"Hanyu Zhao, Zhenhua Han, Zhi Yang, Quanlu Zhang, Fan Yang, Lidong Zhou, Mao Yang, Francis CM Lau, Yuqi Wang, Yifan Xiong, et\u00a0al. 2020. HiveD: Sharing a GPU cluster for deep learning with guarantees. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation. 515\u2013532."}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3711927","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3711927","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:10Z","timestamp":1750295890000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3711927"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,21]]},"references-count":55,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3711927"],"URL":"https:\/\/doi.org\/10.1145\/3711927","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,21]]},"assertion":[{"value":"2024-07-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-21","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}