{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T09:51:14Z","timestamp":1773481874897,"version":"3.50.1"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T00:00:00Z","timestamp":1686614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2021ZD0110202"],"award-info":[{"award-number":["2021ZD0110202"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,6,13]]},"abstract":"<jats:p>Training data pre-processing pipelines are essential to deep learning (DL). As the performance of model training keeps increasing with both hardware advancements (e.g., faster GPUs) and various software optimizations, the data pre-processing on CPUs is becoming more resource-intensive and a severe bottleneck of the pipeline. This problem is even worse in the cloud, where training jobs exhibit diverse CPU-GPU demands that usually result in mismatches with fixed hardware configurations and resource fragmentation, degrading both training performance and cluster utilization.<\/jats:p>\n          <jats:p>We introduce GoldMiner, an input data processing service for stateless operations used in pre-processing data for DL model training. GoldMiner decouples data pre-processing from model training into a new role called the data worker. Data workers facilitate scaling of data pre-processing to anywhere in a cluster, effectively pooling the resources across the cluster to satisfy the diverse requirements of training jobs. GoldMiner achieves this decoupling in a fully automatic and elastic manner. The key insight is that data pre-processing is inherently stateless, thus can be executed independently and elastically. This insight guides GoldMiner to automatically extract stateless computation out of a monolithic training program, efficiently disaggregate it across data workers, and elastically scale data workers to tune the resource allocations across jobs to optimize cluster efficiency. We have applied GoldMiner to industrial workloads, and our evaluation shows that GoldMiner can transform unmodified training programs to use data workers, accelerating individual training jobs by up to 12.1x. GoldMiner also improves average job completion time and aggregate GPU utilization by up to 2.5x and 2.1x in a 64-GPU cluster, respectively, by scheduling data workers with elasticity.<\/jats:p>","DOI":"10.1145\/3589773","type":"journal-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T20:26:45Z","timestamp":1687292805000},"page":"1-25","source":"Crossref","is-referenced-by-count":8,"title":["GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2536-0016","authenticated-orcid":false,"given":"Hanyu","family":"Zhao","sequence":"first","affiliation":[{"name":"Alibaba Group &amp; Peking University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8219-4499","authenticated-orcid":false,"given":"Zhi","family":"Yang","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-9436-674X","authenticated-orcid":false,"given":"Yu","family":"Cheng","sequence":"additional","affiliation":[{"name":"Peking University &amp; Alibaba Group, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8196-7815","authenticated-orcid":false,"given":"Chao","family":"Tian","sequence":"additional","affiliation":[{"name":"Peking University &amp; Alibaba Group, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2430-2009","authenticated-orcid":false,"given":"Shiru","family":"Ren","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3043-522X","authenticated-orcid":false,"given":"Wencong","family":"Xiao","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2045-4408","authenticated-orcid":false,"given":"Man","family":"Yuan","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-4800-3782","authenticated-orcid":false,"given":"Langshi","family":"Chen","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9302-4953","authenticated-orcid":false,"given":"Kaibo","family":"Liu","sequence":"additional","affiliation":[{"name":"Peking University &amp; Alibaba Group, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2036-9764","authenticated-orcid":false,"given":"Yang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9072-3170","authenticated-orcid":false,"given":"Yong","family":"Li","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3003-0150","authenticated-orcid":false,"given":"Wei","family":"Lin","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2023,6,20]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"DeepRec. https:\/\/github.com\/alibaba\/deeprec."},{"key":"e_1_2_2_2_1","unstructured":"HybridBackend. https:\/\/github.com\/alibaba\/HybridBackend."},{"key":"e_1_2_2_3_1","unstructured":"NVIDIA DALI. https:\/\/github.com\/NVIDIA\/DALI."},{"key":"e_1_2_2_4_1","unstructured":"NVIDIA H100. https:\/\/www.nvidia.com\/en-us\/data-center\/h100\/."},{"key":"e_1_2_2_5_1","unstructured":"tf.data.experimental.service. https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/data\/experimental\/service."},{"key":"e_1_2_2_6_1","unstructured":"TorchDynamo. https:\/\/pytorch.org\/docs\/master\/dynamo\/."},{"key":"e_1_2_2_7_1","volume-title":"16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)","author":"Looking","year":"2022","unstructured":"Looking beyond GPUs for DNN scheduling on Multi-Tenant clusters. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), Carlsbad, CA, July 2022. USENIX Association."},{"key":"e_1_2_2_8_1","first-page":"265","volume-title":"TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","volume":"16","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), volume 16, pages 265--283. USENIX Association, 2016."},{"key":"e_1_2_2_9_1","volume-title":"A case for disaggregation of ml data processing. arXiv preprint arXiv:2210.14826","author":"Audibert Andrew","year":"2022","unstructured":"Andrew Audibert, Yang Chen, Dan Graur, Ana Klimovic, Jiri Simsa, and Chandramohan A Thekkath. A case for disaggregation of ml data processing. arXiv preprint arXiv:2210.14826, 2022."},{"key":"e_1_2_2_10_1","volume-title":"Chandramohan A. Thekkath, and Yonghui Wu. Pathways: Asynchronous distributed dataflow for ml. CoRR, abs\/2203.12533","author":"Barham Paul","year":"2022","unstructured":"Paul Barham, Aakanksha Chowdhery, Jeff Dean, Sanjay Ghemawat, Steven Hand, Dan Hurt, Michael Isard, Hyeontaek Lim, Ruoming Pang, Sudip Roy, Brennan Saeta, Parker Schuh, Ryan Sepassi, Laurent El Shafey, Chandramohan A. Thekkath, and Yonghui Wu. Pathways: Asynchronous distributed dataflow for ml. CoRR, abs\/2203.12533, 2022."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2898442.2898444"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3342195.3387555"},{"key":"e_1_2_2_13_1","first-page":"578","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 578--594, Carlsbad, CA, October 2018. USENIX Association."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988450.2988454"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00359"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_2_17_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), pages 4171--4186. Association for Computational Linguistics, 2019."},{"key":"e_1_2_2_18_1","first-page":"689","volume-title":"2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Graur Dan","year":"2022","unstructured":"Dan Graur, Damien Aymon, Dan Kluser, Tanguy Albrici, Chandramohan A Thekkath, and Ana Klimovic. Cachew: Machine learning input data processing as a service. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 689--706, 2022."},{"key":"e_1_2_2_19_1","volume-title":"Elastic model aggregation with parameter service. CoRR, abs\/2204.03211","author":"Gu Juncheng","year":"2022","unstructured":"Juncheng Gu, Mosharaf Chowdhury, Kang G. Shin, and Aditya Akella. Elastic model aggregation with parameter service. CoRR, abs\/2204.03211, 2022."},{"key":"e_1_2_2_20_1","first-page":"485","volume-title":"16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19)","author":"Gu Juncheng","year":"2019","unstructured":"Juncheng Gu, Mosharaf Chowdhury, Kang G Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo. Tiresias: A GPU cluster manager for distributed deep learning. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 485--500, 2019."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/3172077.3172127"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517848"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517848"},{"key":"e_1_2_2_25_1","first-page":"947","volume-title":"2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Jeon Myeongjae","year":"2019","unstructured":"Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. Analysis of large-scale multi-tenant GPU clusters for DNN training workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 947--960, 2019."},{"key":"e_1_2_2_26_1","volume-title":"2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Jia Xianyan","year":"2022","unstructured":"Xianyan Jia, Le Jiang, Ang Wang, Wencong Xiao, Ziji Shi, Jie Zhang, Xinyuan Li, Langshi Chen, Yong Li, Zhen Zheng, Xiaoyong Liu, and Wei Lin. Whale: Efficient giant model training over heterogeneous GPUs. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), Carlsbad, CA, July 2022. USENIX Association."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359630"},{"key":"e_1_2_2_28_1","volume-title":"SysML 2019","author":"Jia Zhihao","year":"2019","unstructured":"Zhihao Jia, James Thomas, Tod Warszawski, Mingyu Gao, Matei Zaharia, and Alex Aiken. Optimizing dnn computation with relaxed graph substitutions. SysML 2019, 2019."},{"key":"e_1_2_2_29_1","first-page":"1","volume":"1","author":"Jia Zhihao","year":"2019","unstructured":"Zhihao Jia, Matei Zaharia, and Alex Aiken. Beyond data and model parallelism for deep neural networks. In A. Talwalkar, V. Smith, and M. Zaharia, editors, Proceedings of Machine Learning and Systems, volume 1, pages 1--13, 2019.","journal-title":"editors, Proceedings of Machine Learning and Systems"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/3357034.3357049"},{"key":"e_1_2_2_31_1","first-page":"1097","volume-title":"Advances in neural information processing systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet Classification with Deep Convolutional neural Networks. In Advances in neural information processing systems, pages 1097--1105, 2012."},{"key":"e_1_2_2_32_1","volume-title":"MLIR: A compiler infrastructure for the end of moore's law. CoRR, abs\/2002.11054","author":"Lattner Chris","year":"2020","unstructured":"Chris Lattner, Jacques A. Pienaar, Mehdi Amini, Uday Bondhugula, River Riddle, Albert Cohen, Tatiana Shpeisman, Andy Davis, Nicolas Vasilache, and Oleksandr Zinenko. MLIR: A compiler infrastructure for the end of moore's law. CoRR, abs\/2002.11054, 2020."},{"key":"e_1_2_2_33_1","first-page":"881","volume-title":"14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20)","author":"Ma Lingxiao","year":"2020","unstructured":"Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, and Lidong Zhou. Rammer: Enabling holistic deep learning compiler optimizations with rtasks. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20), pages 881--897, 2020."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767755"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3446095.3446100"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522738"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476374"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_2_2_40_1","first-page":"481","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Narayanan Deepak","year":"2020","unstructured":"Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, and Matei Zaharia. Heterogeneity-aware cluster scheduling policies for deep learning workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 481--498. USENIX Association, November 2020."},{"key":"e_1_2_2_41_1","first-page":"8024","volume-title":"Advances in Neural Information Processing Systems","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pages 8024--8035, 2019."},{"key":"e_1_2_2_42_1","volume-title":"Chuanxiong Guo. Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters. In Proceedings of the Thirteenth European Conference on Computer Systems. ACM","author":"Peng Yanghua","year":"2018","unstructured":"Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, and Chuanxiong Guo. Optimus: An Efficient Dynamic Resource Scheduler for Deep Learning Clusters. In Proceedings of the Thirteenth European Conference on Computer Systems. ACM, 2018."},{"key":"e_1_2_2_43_1","volume-title":"The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621","author":"Perez Luis","year":"2017","unstructured":"Luis Perez and Jason Wang. The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621, 2017."},{"key":"e_1_2_2_44_1","first-page":"1","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)","author":"Qiao Aurick","year":"2021","unstructured":"Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pages 1--18. USENIX Association, July 2021."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1264"},{"key":"e_1_2_2_46_1","volume-title":"et al. Singularity: Planet-scale, preemptible, elastic scheduling of ai workloads. CoRR, abs\/2202.07848","author":"Shukla Dharma","year":"2022","unstructured":"Dharma Shukla, Muthian Sivathanu, Srinidhi Viswanatha, Bhargav Gulavani, Rimma Nehme, Amey Agrawal, Chen Chen, Nipun Kwatra, Ramachandran Ramjee, Pankaj Sharma, et al. Singularity: Planet-scale, preemptible, elastic scheduling of ai workloads. CoRR, abs\/2202.07848, 2022."},{"key":"e_1_2_2_47_1","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, 2015."},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_2_2_49_1","first-page":"37","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)","author":"Wang Haojie","year":"2021","unstructured":"Haojie Wang, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, Liyan Zheng, Yuanzhi Li, Kaiyuan Rong, Yuanyong Chen, and Zhihao Jia. {PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pages 37--54, 2021."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303953"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3450078"},{"key":"e_1_2_2_52_1","first-page":"595","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. Gandiva: Introspective cluster scheduling for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8--10, 2018, pages 595--610. USENIX Association, 2018."},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267809.3275445"},{"key":"e_1_2_2_54_1","first-page":"533","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Xiao Wencong","year":"2020","unstructured":"Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia. Antman: Dynamic scaling on GPU clusters for deep learning. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 533--548. USENIX Association, November 2020."},{"key":"e_1_2_2_55_1","volume-title":"Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12. USENIX","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12. USENIX, 2012."},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00324"},{"key":"e_1_2_2_57_1","first-page":"515","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Zhao Hanyu","year":"2020","unstructured":"Hanyu Zhao, Zhenhua Han, Zhi Yang, Quanlu Zhang, Fan Yang, Lidong Zhou, Mao Yang, Francis C.M. Lau, Yuqi Wang, Yifan Xiong, and Bin Wang. Hived: Sharing a GPU cluster for deep learning with guarantees. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 515--532. USENIX Association, November 2020."},{"key":"e_1_2_2_58_1","volume-title":"Understanding and co-designing the data ingestion pipeline for industry-scale recsys training. arXiv preprint arXiv:2108.09373, page 4","author":"Zhao Mark","year":"2021","unstructured":"Mark Zhao, Niket Agarwal, Aarti Basant, Bugra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, et al. Understanding and co-designing the data ingestion pipeline for industry-scale recsys training. arXiv preprint arXiv:2108.09373, page 4, 2021."},{"key":"e_1_2_2_59_1","volume-title":"Understanding and co-designing the data ingestion pipeline for industry-scale recsys training. CoRR, abs\/2108.09373","author":"Zhao Mark","year":"2021","unstructured":"Mark Zhao, Niket Agarwal, Aarti Basant, Bugra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, Sundaram Narayanan, Jack Langman, Kevin Wilfong, Harsha Rastogi, Carole-Jean Wu, Christos Kozyrakis, and Parik Pol. Understanding and co-designing the data ingestion pipeline for industry-scale recsys training. CoRR, abs\/2108.09373, 2021."},{"key":"e_1_2_2_60_1","unstructured":"Lianmin Zheng Chengfan Jia Minmin Sun Zhao Wu Cody Hao Yu Ameer Haj-Ali Yida Wang Jun Yang Danyang Zhuo Koushik Sen et al. Ansor: Generating high-performance tensor programs for deep learning. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20) pages 863--879 2020."},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507723"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219823"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589773","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3589773","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:22Z","timestamp":1750182562000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3589773"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,13]]},"references-count":62,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,13]]}},"alternative-id":["10.1145\/3589773"],"URL":"https:\/\/doi.org\/10.1145\/3589773","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,13]]}}}