{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,22]],"date-time":"2026-07-22T15:26:27Z","timestamp":1784733987259,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":64,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,11,11]],"date-time":"2023-11-11T00:00:00Z","timestamp":1699660800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022ZD0117805"],"award-info":[{"award-number":["2022ZD0117805"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072018"],"award-info":[{"award-number":["62072018"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U22A2028"],"award-info":[{"award-number":["U22A2028"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fundamental Research Funds for the Central Universities"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,11,12]]},"DOI":"10.1145\/3581784.3607054","type":"proceedings-article","created":{"date-parts":[[2023,11,14]],"date-time":"2023-11-14T21:47:06Z","timestamp":1699998426000},"page":"1-14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4115-9072","authenticated-orcid":false,"given":"Mingzhen","family":"Li","sequence":"first","affiliation":[{"name":"Beihang University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3043-522X","authenticated-orcid":false,"given":"Wencong","family":"Xiao","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1101-7927","authenticated-orcid":false,"given":"Hailong","family":"Yang","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-7100-0866","authenticated-orcid":false,"given":"Biao","family":"Sun","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2536-0016","authenticated-orcid":false,"given":"Hanyu","family":"Zhao","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2430-2009","authenticated-orcid":false,"given":"Shiru","family":"Ren","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7186-0556","authenticated-orcid":false,"given":"Zhongzhi","family":"Luan","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8663-0653","authenticated-orcid":false,"given":"Xianyan","family":"Jia","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1829-2817","authenticated-orcid":false,"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9072-3170","authenticated-orcid":false,"given":"Yong","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3003-0150","authenticated-orcid":false,"given":"Wei","family":"Lin","sequence":"additional","affiliation":[{"name":"Alibaba Group, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5382-1473","authenticated-orcid":false,"given":"Depei","family":"Qian","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,11,11]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2023. Auto kernel selection of NVIDIA cuDNN. https:\/\/docs.nvidia.com\/deeplearning\/cudnn\/api\/index.html#cudnnGetConvolutionBackwardDataAlgorithm_v7.  2023. Auto kernel selection of NVIDIA cuDNN. https:\/\/docs.nvidia.com\/deeplearning\/cudnn\/api\/index.html#cudnnGetConvolutionBackwardDataAlgorithm_v7."},{"key":"e_1_3_2_1_2_1","unstructured":"2023. Auto selection of cuBLAS in Ampere GPUs. https:\/\/docs.nvidia.com\/cuda\/cublas\/index.html#cublas-GemmEx.  2023. Auto selection of cuBLAS in Ampere GPUs. https:\/\/docs.nvidia.com\/cuda\/cublas\/index.html#cublas-GemmEx."},{"key":"e_1_3_2_1_3_1","unstructured":"2023. Elastic Horovod. https:\/\/github.com\/horovod\/horovod\/blob\/master\/docs\/elastic.rst.  2023. Elastic Horovod. https:\/\/github.com\/horovod\/horovod\/blob\/master\/docs\/elastic.rst."},{"key":"e_1_3_2_1_4_1","unstructured":"2023. ElasticDL https:\/\/github.com\/sql-machine-learning\/elasticdl\/.  2023. ElasticDL https:\/\/github.com\/sql-machine-learning\/elasticdl\/."},{"key":"e_1_3_2_1_5_1","unstructured":"2023. NCCL deterministic. https:\/\/github.com\/NVIDIA\/nccl\/issues\/157.  2023. NCCL deterministic. https:\/\/github.com\/NVIDIA\/nccl\/issues\/157."},{"key":"e_1_3_2_1_6_1","unstructured":"2023. NVIDIA Framework-determinism https:\/\/github.com\/NVIDIA\/framework-determinism.  2023. NVIDIA Framework-determinism https:\/\/github.com\/NVIDIA\/framework-determinism."},{"key":"e_1_3_2_1_7_1","unstructured":"2023. PyTorch Reproducibility. https:\/\/pytorch.org\/docs\/stable\/notes\/randomness.html.  2023. PyTorch Reproducibility. https:\/\/pytorch.org\/docs\/stable\/notes\/randomness.html."},{"key":"e_1_3_2_1_8_1","unstructured":"2023. torch.backends.cudnn.benchmark. https:\/\/pytorch.org\/docs\/stable\/backends.html.  2023. torch.backends.cudnn.benchmark. https:\/\/pytorch.org\/docs\/stable\/backends.html."},{"key":"e_1_3_2_1_9_1","unstructured":"2023. TorchElastic. https:\/\/pytorch.org\/docs\/stable\/distributed.elastic.html.  2023. TorchElastic. https:\/\/pytorch.org\/docs\/stable\/distributed.elastic.html."},{"key":"e_1_3_2_1_10_1","volume-title":"TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . 2016 . TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) . USENIX Association, Savannah, GA, 265--283. Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 265--283."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3492321.3519584"},{"key":"e_1_3_2_1_12_1","volume-title":"PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Bai Zhihao","year":"2020","unstructured":"Zhihao Bai , Zhen Zhang , Yibo Zhu , and Xin Jin . 2020 . PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . USENIX Association, 499--514. Zhihao Bai, Zhen Zhang, Yibo Zhu, and Xin Jin. 2020. PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 499--514."},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of Machine Learning and Systems, D. Marculescu, Y. Chi, and C. Wu (Eds.)","volume":"4","author":"Barham Paul","year":"2022","unstructured":"Paul Barham , Aakanksha Chowdhery , Jeff Dean , Sanjay Ghemawat , Steven Hand , Daniel Hurt , Michael Isard , Hyeontaek Lim , Ruoming Pang , Sudip Roy , Brennan Saeta , Parker Schuh , Ryan Sepassi , Laurent Shafey , Chandu Thekkath , and Yonghui Wu . 2022 . Pathways: Asynchronous Distributed Dataflow for ML . In Proceedings of Machine Learning and Systems, D. Marculescu, Y. Chi, and C. Wu (Eds.) , Vol. 4 . 430--449. Paul Barham, Aakanksha Chowdhery, Jeff Dean, Sanjay Ghemawat, Steven Hand, Daniel Hurt, Michael Isard, Hyeontaek Lim, Ruoming Pang, Sudip Roy, Brennan Saeta, Parker Schuh, Ryan Sepassi, Laurent Shafey, Chandu Thekkath, and Yonghui Wu. 2022. Pathways: Asynchronous Distributed Dataflow for ML. In Proceedings of Machine Learning and Systems, D. Marculescu, Y. Chi, and C. Wu (Eds.), Vol. 4. 430--449."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3480859"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2898442.2898444"},{"key":"e_1_3_2_1_16_1","volume-title":"Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4","author":"Carbone Paris","year":"2015","unstructured":"Paris Carbone , Asterios Katsifodimos , Stephan Ewen , Volker Markl , Seif Haridi , and Kostas Tzoumas . 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 ( 2015 ). Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4 (2015)."},{"key":"e_1_3_2_1_17_1","volume-title":"8th International Conference on Learning Representations, ICLR 2020","author":"Clark Kevin","year":"2020","unstructured":"Kevin Clark , Minh-Thang Luong , Quoc V. Le , and Christopher D. Manning . 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators . In 8th International Conference on Learning Representations, ICLR 2020 , Addis Ababa, Ethiopia, April 26--30 , 2020 . OpenReview.net. https:\/\/openreview.net\/forum?id=r1xMH1BtvB Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net. https:\/\/openreview.net\/forum?id=r1xMH1BtvB"},{"key":"e_1_3_2_1_18_1","volume-title":"MapReduce: Simplified Data Processing on Large Clusters. In 6th Symposium on Operating Systems Design & Implementation (OSDI 04)","author":"Dean Jeffrey","year":"2004","unstructured":"Jeffrey Dean and Sanjay Ghemawat . 2004 . MapReduce: Simplified Data Processing on Large Clusters. In 6th Symposium on Operating Systems Design & Implementation (OSDI 04) . USENIX Association, San Francisco, CA. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In 6th Symposium on Operating Systems Design & Implementation (OSDI 04). USENIX Association, San Francisco, CA."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_20_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 , Minneapolis, MN, USA, June 2--7 , 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, USA, 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, USA, 4171--4186."},{"key":"e_1_3_2_1_21_1","volume-title":"Christopher KI Williams, John Winn, and Andrew Zisserman.","author":"Everingham Mark","year":"2010","unstructured":"Mark Everingham , Luc Van Gool , Christopher KI Williams, John Winn, and Andrew Zisserman. 2010 . The pascal visual object classes (voc) challenge. International journal of computer vision 88, 2 (2010), 303--338. Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International journal of computer vision 88, 2 (2010), 303--338."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3437801.3441593"},{"key":"e_1_3_2_1_23_1","volume-title":"2020 IEEE\/ACM 42nd International Conference on Software Engineering (ICSE). IEEE","author":"Gerasimou Simos","year":"2020","unstructured":"Simos Gerasimou , Hasan Ferit Eniser , Alper Sen , and Alper Cakan . 2020 . Importance-driven deep learning system testing . In 2020 IEEE\/ACM 42nd International Conference on Software Engineering (ICSE). IEEE , Seoul, Korea (South), 702--713. Simos Gerasimou, Hasan Ferit Eniser, Alper Sen, and Alper Cakan. 2020. Importance-driven deep learning system testing. In 2020 IEEE\/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, Seoul, Korea (South), 702--713."},{"key":"e_1_3_2_1_24_1","volume-title":"Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR abs\/1706.02677","author":"Goyal Priya","year":"2017","unstructured":"Priya Goyal , Piotr Doll\u00e1r , Ross B. Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , and Kaiming He. 2017. Accurate , Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR abs\/1706.02677 ( 2017 ). arXiv:1706.02677 http:\/\/arxiv.org\/abs\/1706.02677 Priya Goyal, Piotr Doll\u00e1r, Ross B. Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. CoRR abs\/1706.02677 (2017). arXiv:1706.02677 http:\/\/arxiv.org\/abs\/1706.02677"},{"key":"e_1_3_2_1_25_1","volume-title":"Elastic Model Aggregation with Parameter Service. CoRR abs\/2204.03211","author":"Gu Juncheng","year":"2022","unstructured":"Juncheng Gu , Mosharaf Chowdhury , Kang G. Shin , and Aditya Akella . 2022. Elastic Model Aggregation with Parameter Service. CoRR abs\/2204.03211 ( 2022 ). arXiv:2204.03211 https:\/\/arxiv.org\/abs\/2204.03211 Juncheng Gu, Mosharaf Chowdhury, Kang G. Shin, and Aditya Akella. 2022. Elastic Model Aggregation with Parameter Service. CoRR abs\/2204.03211 (2022). arXiv:2204.03211 https:\/\/arxiv.org\/abs\/2204.03211"},{"key":"e_1_3_2_1_26_1","volume-title":"Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)","author":"Han Mingcong","year":"2022","unstructured":"Mingcong Han , Hanze Zhang , Rong Chen , and Haibo Chen . 2022 . Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22) . USENIX Association, Carlsbad, CA, 539--558. https:\/\/www.usenix.org\/conference\/osdi22\/presentation\/han Mingcong Han, Hanze Zhang, Rong Chen, and Haibo Chen. 2022. Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 539--558. https:\/\/www.usenix.org\/conference\/osdi22\/presentation\/han"},{"key":"e_1_3_2_1_27_1","volume-title":"The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4","author":"Maxwell Harper F","year":"2015","unstructured":"F Maxwell Harper and Joseph A Konstan . 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 ( 2015 ), 1--19. F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1--19."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052569"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272996.1273005"},{"key":"e_1_3_2_1_31_1","volume-title":"Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Jeon Myeongjae","year":"2019","unstructured":"Myeongjae Jeon , Shivaram Venkataraman , Amar Phanishayee , Junjie Qian , Wencong Xiao , and Fan Yang . 2019 . Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . 947--960. Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 947--960."},{"key":"e_1_3_2_1_33_1","volume-title":"2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Jia Xianyan","year":"2022","unstructured":"Xianyan Jia , Le Jiang , Ang Wang , Wencong Xiao , Ziji Shi , Jie Zhang , Xinyuan Li , Langshi Chen , Yong Li , Zhen Zheng , Xiaoyong Liu , and Wei Lin . 2022 . Whale: Efficient Giant Model Training over Heterogeneous GPUs . In 2022 USENIX Annual Technical Conference (USENIX ATC 22) . USENIX Association, Carlsbad, CA, 673--688. Xianyan Jia, Le Jiang, Ang Wang, Wencong Xiao, Ziji Shi, Jie Zhang, Xinyuan Li, Langshi Chen, Yong Li, Zhen Zheng, Xiaoyong Liu, and Wei Lin. 2022. Whale: Efficient Giant Model Training over Heterogeneous GPUs. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 673--688."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415530"},{"key":"e_1_3_2_1_35_1","volume-title":"Zico: Efficient GPU Memory Sharing for Concurrent DNN Training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Lim Gangmuk","year":"2021","unstructured":"Gangmuk Lim , Jeongseob Ahn , Wencong Xiao , Youngjin Kwon , and Myeongjae Jeon . 2021 . Zico: Efficient GPU Memory Sharing for Concurrent DNN Training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21) . USENIX Association, 161--175. Gangmuk Lim, Jeongseob Ahn, Wencong Xiao, Youngjin Kwon, and Myeongjae Jeon. 2021. Zico: Efficient GPU Memory Sharing for Concurrent DNN Training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 161--175."},{"key":"e_1_3_2_1_36_1","volume-title":"StreamScope: Continuous Reliable Distributed Processing of Big Data Streams. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16)","author":"Lin Wei","year":"2016","unstructured":"Wei Lin , Zhengping Qian , Junwei Xu , Sen Yang , Jingren Zhou , and Lidong Zhou . 2016 . StreamScope: Continuous Reliable Distributed Processing of Big Data Streams. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16) . USENIX Association, Santa Clara, CA, 439--453. Wei Lin, Zhengping Qian, Junwei Xu, Sen Yang, Jingren Zhou, and Lidong Zhou. 2016. StreamScope: Continuous Reliable Distributed Processing of Big Data Streams. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). USENIX Association, Santa Clara, CA, 439--453."},{"key":"e_1_3_2_1_37_1","volume-title":"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. CoRR abs\/2103.14030","author":"Liu Ze","year":"2021","unstructured":"Ze Liu , Yutong Lin , Yue Cao , Han Hu , Yixuan Wei , Zheng Zhang , Stephen Lin , and Baining Guo . 2021 . Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. CoRR abs\/2103.14030 (2021). arXiv:2103.14030 https:\/\/arxiv.org\/abs\/2103.14030 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. CoRR abs\/2103.14030 (2021). arXiv:2103.14030 https:\/\/arxiv.org\/abs\/2103.14030"},{"key":"e_1_3_2_1_38_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Ma Lingxiao","year":"2020","unstructured":"Lingxiao Ma , Zhiqiang Xie , Zhi Yang , Jilong Xue , Youshan Miao , Wei Cui , Wenxiang Hu , Fan Yang , Lintao Zhang , and Lidong Zhou . 2020 . Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks . In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . USENIX Association, 881--897. https:\/\/www.usenix.org\/conference\/osdi20\/presentation\/ma Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, and Lidong Zhou. 2020. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 881--897. https:\/\/www.usenix.org\/conference\/osdi20\/presentation\/ma"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"e_1_3_2_1_40_1","volume-title":"KungFu: Making Training in Distributed Machine Learning Adaptive. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Mai Luo","year":"2020","unstructured":"Luo Mai , Guo Li , Marcel Wagenl\u00e4nder , Konstantinos Fertakis , Andrei-Octavian Brabete , and Peter Pietzuch . 2020 . KungFu: Making Training in Distributed Machine Learning Adaptive. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . USENIX Association, 937--954. Luo Mai, Guo Li, Marcel Wagenl\u00e4nder, Konstantinos Fertakis, Andrei-Octavian Brabete, and Peter Pietzuch. 2020. KungFu: Making Training in Distributed Machine Learning Adaptive. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 937--954."},{"key":"e_1_3_2_1_41_1","volume-title":"Deterministic implementations for reproducibility in deep reinforcement learning. arXiv preprint arXiv:1809.05676","author":"Nagarajan Prabhat","year":"2018","unstructured":"Prabhat Nagarajan , Garrett Warnell , and Peter Stone . 2018. Deterministic implementations for reproducibility in deep reinforcement learning. arXiv preprint arXiv:1809.05676 ( 2018 ). Prabhat Nagarajan, Garrett Warnell, and Peter Stone. 2018. Deterministic implementations for reproducibility in deep reinforcement learning. arXiv preprint arXiv:1809.05676 (2018)."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_3_2_1_43_1","volume-title":"Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Narayanan Deepak","year":"2020","unstructured":"Deepak Narayanan , Keshav Santhanam , Fiodar Kazhamiaka , Amar Phanishayee , and Matei Zaharia . 2020 . Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . USENIX Association, 481--498. Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, and Matei Zaharia. 2020. Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 481--498."},{"key":"e_1_3_2_1_44_1","volume-title":"Proceedings of Machine Learning and Systems, D. Marculescu, Y. Chi, and C. Wu (Eds.)","volume":"4","author":"Or Andrew","year":"2022","unstructured":"Andrew Or , Haoyu Zhang , and Michael None Freedman . 2022 . VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware . In Proceedings of Machine Learning and Systems, D. Marculescu, Y. Chi, and C. Wu (Eds.) , Vol. 4 . 126--140. Andrew Or, Haoyu Zhang, and Michael None Freedman. 2022. VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware. In Proceedings of Machine Learning and Systems, D. Marculescu, Y. Chi, and C. Wu (Eds.), Vol. 4. 126--140."},{"key":"e_1_3_2_1_45_1","unstructured":"Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury Gregory Chanan Trevor Killeen Zeming Lin Natalia Gimelshein Luca Antiga etal 2019. PyTorch: An imperative style high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024--8035.  Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury Gregory Chanan Trevor Killeen Zeming Lin Natalia Gimelshein Luca Antiga et al. 2019. PyTorch: An imperative style high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024--8035."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190517"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416545"},{"key":"e_1_3_2_1_48_1","volume-title":"Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training. Advances in Neural Information Processing Systems 34","author":"Qian Shangshu","year":"2021","unstructured":"Shangshu Qian , Hung Pham , Thibaud Lutellier , Zeou Hu , Jungwon Kim , Lin Tan , Yaoliang Yu , Jiahao Chen , and Sameena Shah . 2021. Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training. Advances in Neural Information Processing Systems 34 ( 2021 ). Shangshu Qian, Hung Pham, Thibaud Lutellier, Zeou Hu, Jungwon Kim, Lin Tan, Yaoliang Yu, Jiahao Chen, and Sameena Shah. 2021. Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training. Advances in Neural Information Processing Systems 34 (2021)."},{"key":"e_1_3_2_1_49_1","volume-title":"Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)","author":"Qiao Aurick","unstructured":"Aurick Qiao , Sang Keun Choe , Suhas Jayaram Subramanya , Willie Neiswanger , Qirong Ho , Hao Zhang , Gregory R. Ganger , and Eric P. Xing . 2021 . Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21) . USENIX Association, 1--18. Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. 2021. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, 1--18."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00024"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/d16-1264"},{"key":"e_1_3_2_1_52_1","volume-title":"YOLOv3: An Incremental Improvement. CoRR abs\/1804.02767","author":"Redmon Joseph","year":"2018","unstructured":"Joseph Redmon and Ali Farhadi . 2018. YOLOv3: An Incremental Improvement. CoRR abs\/1804.02767 ( 2018 ). arXiv:1804.02767 http:\/\/arxiv.org\/abs\/1804.02767 Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. CoRR abs\/1804.02767 (2018). arXiv:1804.02767 http:\/\/arxiv.org\/abs\/1804.02767"},{"key":"e_1_3_2_1_53_1","volume-title":"Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads. CoRR abs\/2202.07848","author":"Shukla Dharma","year":"2022","unstructured":"Dharma Shukla , Muthian Sivathanu , Srinidhi Viswanatha , Bhargav Gulavani , Rimma Nehme , Amey Agrawal , Chen Chen , Nipun Kwatra , Ramachandran Ramjee , Pankaj Sharma , 2022 . Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads. CoRR abs\/2202.07848 (2022). Dharma Shukla, Muthian Sivathanu, Srinidhi Viswanatha, Bhargav Gulavani, Rimma Nehme, Amey Agrawal, Chen Chen, Nipun Kwatra, Ramachandran Ramjee, Pankaj Sharma, et al. 2022. Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads. CoRR abs\/2202.07848 (2022)."},{"key":"e_1_3_2_1_54_1","volume-title":"3rd International Conference on Learning Representations, ICLR","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition . In 3rd International Conference on Learning Representations, ICLR 2015 , San Diego, CA , USA, May 7--9, 2015, Conference Track Proceedings . http:\/\/arxiv.org\/abs\/1409.1556 Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings. http:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304072"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2741948.2741964"},{"key":"e_1_3_2_1_57_1","volume-title":"MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)","author":"Weng Qizhen","year":"2022","unstructured":"Qizhen Weng , Wencong Xiao , Yinghao Yu , Wei Wang , Cheng Wang , Jian He , Yong Li , Liping Zhang , Wei Lin , and Yu Ding . 2022 . MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22) . USENIX Association, Renton, WA, 945--960. Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding. 2022. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 945--960."},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3064966"},{"key":"e_1_3_2_1_59_1","volume-title":"Gandiva: Introspective Cluster Scheduling for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao , Romil Bhardwaj , Ramachandran Ramjee , Muthian Sivathanu , Nipun Kwatra , Zhenhua Han , Pratyush Patel , Xuan Peng , Hanyu Zhao , Quanlu Zhang , Fan Yang , and Lidong Zhou . 2018 . Gandiva: Introspective Cluster Scheduling for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018 , Carlsbad, CA, USA, October 8--10 , 2018. USENIX Association, 595--610. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/xiao Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. 2018. Gandiva: Introspective Cluster Scheduling for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8--10, 2018. USENIX Association, 595--610. https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/xiao"},{"key":"e_1_3_2_1_60_1","volume-title":"AntMan: Dynamic Scaling on GPU Clusters for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Xiao Wencong","year":"2020","unstructured":"Wencong Xiao , Shiru Ren , Yong Li , Yang Zhang , Pengyang Hou , Zhi Li , Yihui Feng , Wei Lin , and Yangqing Jia . 2020 . AntMan: Dynamic Scaling on GPU Clusters for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . USENIX Association, 533--548. Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia. 2020. AntMan: Dynamic Scaling on GPU Clusters for Deep Learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 533--548."},{"key":"e_1_3_2_1_61_1","volume-title":"Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia , Mosharaf Chowdhury , Tathagata Das , Ankur Dave , Justin Ma , Murphy McCauley , Michael J. Franklin , Scott Shenker , and Ion Stoica . 2012 . Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing . In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation ( San Jose, CA) (NSDI'12). USENIX, 1 pages. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (San Jose, CA) (NSDI'12). USENIX, 1 pages."},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3127479.3127490"},{"key":"e_1_3_2_1_63_1","volume-title":"Retiarii: A Deep Learning Exploratory-Training Framework. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Zhang Quanlu","year":"2020","unstructured":"Quanlu Zhang , Zhenhua Han , Fan Yang , Yuge Zhang , Zhe Liu , Mao Yang , and Lidong Zhou . 2020 . Retiarii: A Deep Learning Exploratory-Training Framework. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . USENIX Association, 919--936. Quanlu Zhang, Zhenhua Han, Fan Yang, Yuge Zhang, Zhe Liu, Mao Yang, and Lidong Zhou. 2020. Retiarii: A Deep Learning Exploratory-Training Framework. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 919--936."},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3158369"},{"key":"e_1_3_2_1_65_1","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Zhao Hanyu","year":"2020","unstructured":"Hanyu Zhao , Zhenhua Han , Zhi Yang , Quanlu Zhang , Fan Yang , Lidong Zhou , Mao Yang , Francis C.M. Lau , Yuqi Wang , Yifan Xiong , and Bin Wang . 2020 . HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees . In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . USENIX Association, 515--532. Hanyu Zhao, Zhenhua Han, Zhi Yang, Quanlu Zhang, Fan Yang, Lidong Zhou, Mao Yang, Francis C.M. Lau, Yuqi Wang, Yifan Xiong, and Bin Wang. 2020. HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 515--532."}],"event":{"name":"SC '23: International Conference for High Performance Computing, Networking, Storage and Analysis","location":"Denver CO USA","acronym":"SC '23","sponsor":["SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing","IEEE CS"]},"container-title":["Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3581784.3607054","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3581784.3607054","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:22Z","timestamp":1750178182000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3581784.3607054"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,11]]},"references-count":64,"alternative-id":["10.1145\/3581784.3607054","10.1145\/3581784"],"URL":"https:\/\/doi.org\/10.1145\/3581784.3607054","relation":{},"subject":[],"published":{"date-parts":[[2023,11,11]]},"assertion":[{"value":"2023-11-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}