{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T16:33:28Z","timestamp":1773246808234,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":53,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,11,1]],"date-time":"2021-11-01T00:00:00Z","timestamp":1635724800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,11]]},"DOI":"10.1145\/3472883.3486978","type":"proceedings-article","created":{"date-parts":[[2021,10,27]],"date-time":"2021-10-27T10:48:16Z","timestamp":1635331696000},"page":"609-623","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":47,"title":["Chronus"],"prefix":"10.1145","author":[{"given":"Wei","family":"Gao","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Nanyang Technological University and S-Lab, Nanyang Technological University"}]},{"given":"Zhisheng","family":"Ye","sequence":"additional","affiliation":[{"name":"Peking University"}]},{"given":"Peng","family":"Sun","sequence":"additional","affiliation":[{"name":"SenseTime"}]},{"given":"Yonggang","family":"Wen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanyang Technological University"}]},{"given":"Tianwei","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanyang Technological University"}]}],"member":"320","published-online":{"date-parts":[[2021,11]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org.  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"crossref","unstructured":"Brendan Burns Brian Grant David Oppenheimer Eric Brewer and John Wilkes. 2016. Borg Omega and Kubernetes. ACM Queue (2016).  Brendan Burns Brian Grant David Oppenheimer Eric Brewer and John Wilkes. 2016. Borg Omega and Kubernetes. ACM Queue (2016).","DOI":"10.1145\/2890784"},{"key":"e_1_3_2_2_3_1","volume-title":"Hard real-time computing systems: predictable scheduling algorithms and applications","author":"Buttazzo Giorgio C","unstructured":"Giorgio C Buttazzo . 2011. Hard real-time computing systems: predictable scheduling algorithms and applications . Springer Science & Business Media . Giorgio C Buttazzo. 2011. Hard real-time computing systems: predictable scheduling algorithms and applications. Springer Science & Business Media."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3342195.3387555"},{"key":"e_1_3_2_2_5_1","volume-title":"USENIX Annual Technical Conference.","author":"Chen Wei","year":"2017","unstructured":"Wei Chen , Jia Rao , and Xiaobo Zhou . 2017 . Preemptive, low latency datacenter scheduling via lightweight virtualization . In USENIX Annual Technical Conference. Wei Chen, Jia Rao, and Xiaobo Zhou. 2017. Preemptive, low latency datacenter scheduling via lightweight virtualization. In USENIX Annual Technical Conference."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2072298.2072362"},{"key":"e_1_3_2_2_7_1","volume-title":"Deep Learning Research and Development Platform: Characterizing and Scheduling with QoS Guarantees on GPU Clusters","author":"Chen Zhaoyun","year":"2019","unstructured":"Zhaoyun Chen , Wei Quan , Mei Wen , Jianbin Fang , Jie Yu , Chunyuan Zhang , and Lei Luo . 2019. Deep Learning Research and Development Platform: Characterizing and Scheduling with QoS Guarantees on GPU Clusters . IEEE Transactions on Parallel and Distributed Systems ( 2019 ). Zhaoyun Chen, Wei Quan, Mei Wen, Jianbin Fang, Jie Yu, Chunyuan Zhang, and Lei Luo. 2019. Deep Learning Research and Development Platform: Characterizing and Scheduling with QoS Guarantees on GPU Clusters. IEEE Transactions on Parallel and Distributed Systems (2019)."},{"key":"e_1_3_2_2_8_1","unstructured":"Gurobi Company. 2021. Gurobi Optimization: https:\/\/https:\/\/www.gurobi.com\/. https:\/\/www.gurobi.com\/  Gurobi Company. 2021. Gurobi Optimization: https:\/\/https:\/\/www.gurobi.com\/. https:\/\/www.gurobi.com\/"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2670979.2670981"},{"key":"e_1_3_2_2_10_1","volume-title":"Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices","author":"Delimitrou Christina","year":"2013","unstructured":"Christina Delimitrou and Christos Kozyrakis . 2013 . Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices (2013). Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices (2013)."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541941"},{"key":"e_1_3_2_2_12_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT.","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2006.18"},{"key":"e_1_3_2_2_15_1","volume-title":"Generative adversarial networks. arXiv preprint arXiv:1406.2661","author":"Goodfellow Ian J","year":"2014","unstructured":"Ian J Goodfellow , Jean Pouget-Abadie , Mehdi Mirza , Bing Xu , David Warde-Farley , Sherjil Ozair , Aaron Courville , and Yoshua Bengio . 2014. Generative adversarial networks. arXiv preprint arXiv:1406.2661 ( 2014 ). Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)."},{"key":"e_1_3_2_2_16_1","volume-title":"USENIX Symposium on Networked Systems Design and Implementation.","author":"Gu Juncheng","year":"2019","unstructured":"Juncheng Gu , Mosharaf Chowdhury , Kang G Shin , Yibo Zhu , Myeongjae Jeon , Junjie Qian , Hongqiang Liu , and Chuanxiong Guo . 2019 . Tiresias: A GPU cluster manager for distributed deep learning . In USENIX Symposium on Networked Systems Design and Implementation. Juncheng Gu, Mosharaf Chowdhury, Kang G Shin, Yibo Zhu, Myeongjae Jeon, Junjie Qian, Hongqiang Liu, and Chuanxiong Guo. 2019. Tiresias: A GPU cluster manager for distributed deep learning. In USENIX Symposium on Networked Systems Design and Implementation."},{"key":"e_1_3_2_2_17_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR."},{"key":"e_1_3_2_2_18_1","volume-title":"Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861","author":"Howard Andrew G","year":"2017","unstructured":"Andrew G Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , and Hartwig Adam . 2017 . Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017). Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476223"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391239"},{"key":"e_1_3_2_2_21_1","volume-title":"USENIX Annual Technical Conference.","author":"Jeon Myeongjae","year":"2019","unstructured":"Myeongjae Jeon , Shivaram Venkataraman , Amar Phanishayee , Junjie Qian , Wencong Xiao , and Fan Yang . 2019 . Analysis of large-scale multi-tenant GPU clusters for DNN training workloads . In USENIX Annual Technical Conference. Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of large-scale multi-tenant GPU clusters for DNN training workloads. In USENIX Annual Technical Conference."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLOUD.2019.00022"},{"key":"e_1_3_2_2_23_1","volume-title":"12th USENIX Symposium on Operating Systems Design and Implementation.","author":"Jyothi Sangeetha Abdu","year":"2016","unstructured":"Sangeetha Abdu Jyothi , Carlo Curino , Ishai Menache , Shravan Matthur Narayanamurthy , Alexey Tumanov , Jonathan Yaniv , Ruslan Mavlyutov , Inigo Goiri , Subru Krishnan , Janardhan Kulkarni , 2016 . Morpheus: Towards automated slos for enterprise clusters . In 12th USENIX Symposium on Operating Systems Design and Implementation. Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Inigo Goiri, Subru Krishnan, Janardhan Kulkarni, et al. 2016. Morpheus: Towards automated slos for enterprise clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation."},{"key":"e_1_3_2_2_24_1","unstructured":"Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. Citeseer.  Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. Citeseer."},{"key":"e_1_3_2_2_25_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2017","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2017 . ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM ( 2017). Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM (2017)."},{"key":"e_1_3_2_2_26_1","unstructured":"Kubernetes contributors. 2021. Kubernetes: https:\/\/kubernetes.io\/. https:\/\/kubernetes.io\/  Kubernetes contributors. 2021. Kubernetes: https:\/\/kubernetes.io\/. https:\/\/kubernetes.io\/"},{"key":"e_1_3_2_2_27_1","unstructured":"MIT Distributed Robotics Laboratory. [n.d.]. Github repository https:\/\/github\/com\/mit-drl\/goop: Generalized Mixed Integer Optimization in Go. https:\/\/github.com\/mit-drl\/goop  MIT Distributed Robotics Laboratory. [n.d.]. Github repository https:\/\/github\/com\/mit-drl\/goop: Generalized Mixed Integer Optimization in Go. https:\/\/github.com\/mit-drl\/goop"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3342195.3387547"},{"key":"e_1_3_2_2_29_1","volume-title":"DCloud: deadline-aware resource allocation for cloud computing jobs","author":"Li Dan","year":"2015","unstructured":"Dan Li , Congjie Chen , Junjie Guan , Ying Zhang , Jing Zhu , and Ruozhou Yu. 2015. DCloud: deadline-aware resource allocation for cloud computing jobs . IEEE transactions on parallel and distributed systems ( 2015 ). Dan Li, Congjie Chen, Junjie Guan, Ying Zhang, Jing Zhu, and Ruozhou Yu. 2015. DCloud: deadline-aware resource allocation for cloud computing jobs. IEEE transactions on parallel and distributed systems (2015)."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357223.3362719"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1855013"},{"key":"e_1_3_2_2_32_1","volume-title":"Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM","author":"Liu Chung Laung","year":"1973","unstructured":"Chung Laung Liu and James W Layland . 1973. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM ( 1973 ). Chung Laung Liu and James W Layland. 1973. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM (1973)."},{"key":"e_1_3_2_2_33_1","volume-title":"Chronos: Meeting coflow deadlines in data center networks. In International Conference on Communications.","author":"Ma Shiyao","year":"2016","unstructured":"Shiyao Ma , Jingjie Jiang , Bo Li , and Baochun Li . 2016 . Chronos: Meeting coflow deadlines in data center networks. In International Conference on Communications. Shiyao Ma, Jingjie Jiang, Bo Li, and Baochun Li. 2016. Chronos: Meeting coflow deadlines in data center networks. In International Conference on Communications."},{"key":"e_1_3_2_2_34_1","volume-title":"Themis: Fair and Efficient GPU Cluster Scheduling. In 17th USENIX Symposium on Networked Systems Design and Implementation.","author":"Mahajan Kshiteej","year":"2020","unstructured":"Kshiteej Mahajan , Arjun Balasubramanian , Arjun Singhvi , Shivaram Venkataraman , Aditya Akella , Amar Phanishayee , and Shuchi Chawla . 2020 . Themis: Fair and Efficient GPU Cluster Scheduling. In 17th USENIX Symposium on Networked Systems Design and Implementation. Kshiteej Mahajan, Arjun Balasubramanian, Arjun Singhvi, Shivaram Venkataraman, Aditya Akella, Amar Phanishayee, and Shuchi Chawla. 2020. Themis: Fair and Efficient GPU Cluster Scheduling. In 17th USENIX Symposium on Networked Systems Design and Implementation."},{"key":"e_1_3_2_2_35_1","volume-title":"19th USENIX Conference on File and Storage Technologies.","author":"Mohan Jayashree","year":"2021","unstructured":"Jayashree Mohan , Amar Phanishayee , and Vijay Chidambaram . 2021 . CheckFreq: Frequent, Fine-Grained {DNN} Checkpointing . In 19th USENIX Conference on File and Storage Technologies. Jayashree Mohan, Amar Phanishayee, and Vijay Chidambaram. 2021. CheckFreq: Frequent, Fine-Grained {DNN} Checkpointing. In 19th USENIX Conference on File and Storage Technologies."},{"key":"e_1_3_2_2_36_1","volume-title":"ACM SIGMOD International Conference on Management of data.","author":"Morton Kristi","year":"2010","unstructured":"Kristi Morton , Magdalena Balazinska , and Dan Grossman . 2010 . Para-Timer: a progress indicator for MapReduce DAGs . In ACM SIGMOD International Conference on Management of data. Kristi Morton, Magdalena Balazinska, and Dan Grossman. 2010. Para-Timer: a progress indicator for MapReduce DAGs. In ACM SIGMOD International Conference on Management of data."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"crossref","unstructured":"Kristi Morton Abram Friesen Magdalena Balazinska and Dan Grossman. 2010. Estimating the progress of MapReduce pipelines. In nternational Conference on Data Engineering.  Kristi Morton Abram Friesen Magdalena Balazinska and Dan Grossman. 2010. Estimating the progress of MapReduce pipelines. In nternational Conference on Data Engineering.","DOI":"10.1109\/ICDE.2010.5447919"},{"key":"e_1_3_2_2_38_1","volume-title":"Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation.","author":"Narayanan Deepak","year":"2020","unstructured":"Deepak Narayanan , Keshav Santhanam , Fiodar Kazhamiaka , Amar Phanishayee , and Matei Zaharia . 2020 . Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation. Deepak Narayanan, Keshav Santhanam, Fiodar Kazhamiaka, Amar Phanishayee, and Matei Zaharia. 2020. Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads. In 14th USENIX Symposium on Operating Systems Design and Implementation."},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190515"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190517"},{"key":"e_1_3_2_2_42_1","volume-title":"Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. In 15th USENIX Symposium on Operating Systems Design and Implementation.","author":"Qiao Aurick","year":"2021","unstructured":"Aurick Qiao , Keun Choe Sang , Jayaram Subramanya Suhas , Neiswanger Willie , Qirong Ho , Hao Zhang , Gregory R Ganger , and Eric P Xing . 2021 . Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. In 15th USENIX Symposium on Operating Systems Design and Implementation. Aurick Qiao, Keun Choe Sang, Jayaram Subramanya Suhas, Neiswanger Willie, Qirong Ho, Hao Zhang, Gregory R Ganger, and Eric P Xing. 2021. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. In 15th USENIX Symposium on Operating Systems Design and Implementation."},{"key":"e_1_3_2_2_43_1","volume-title":"Esteban Meneses, Leonardo Bautista Gomez, and Rosa M Badia.","author":"Rojas Elvis","year":"2020","unstructured":"Elvis Rojas , Albert Njoroge Kahira , Esteban Meneses, Leonardo Bautista Gomez, and Rosa M Badia. 2020 . A Study of Checkpointing in Large Scale Training of Deep Neural Networks . arXiv preprint arXiv:2012.00825 (2020). Elvis Rojas, Albert Njoroge Kahira, Esteban Meneses, Leonardo Bautista Gomez, and Rosa M Badia. 2020. A Study of Checkpointing in Large Scale Training of Deep Neural Networks. arXiv preprint arXiv:2012.00825 (2020)."},{"key":"e_1_3_2_2_44_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.  Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR."},{"key":"e_1_3_2_2_45_1","unstructured":"Richard S Sutton David A McAllester Satinder P Singh Yishay Mansour etal 1999. Policy gradient methods for reinforcement learning with function approximation.. In NIPS.  Richard S Sutton David A McAllester Satinder P Singh Yishay Mansour et al. 1999. Policy gradient methods for reinforcement learning with function approximation.. In NIPS."},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_2_2_47_1","volume-title":"Michael A Kozuch, and Gregory R Ganger.","author":"Tumanov Alexey","year":"2016","unstructured":"Alexey Tumanov , Angela Jiang , Jun Woo Park , Michael A Kozuch, and Gregory R Ganger. 2016 . Jamaisvu : Robust scheduling with auto-estimated job runtimes. Parallel Data Laboratory, Carnegie Mellon University , Tech. Rep. (2016). Alexey Tumanov, Angela Jiang, Jun Woo Park, Michael A Kozuch, and Gregory R Ganger. 2016. Jamaisvu: Robust scheduling with auto-estimated job runtimes. Parallel Data Laboratory, Carnegie Mellon University, Tech. Rep. (2016)."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2901318.2901355"},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523633"},{"key":"e_1_3_2_2_50_1","volume-title":"Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In USENIX Symposium on Networked Systems Design and Implementation (NSDI'16)","author":"Venkataraman Shivaram","year":"2016","unstructured":"Shivaram Venkataraman , Zongheng Yang , Michael Franklin , Benjamin Recht , and Ion Stoica . 2016 . Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In USENIX Symposium on Networked Systems Design and Implementation (NSDI'16) . Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. 2016. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics. In USENIX Symposium on Networked Systems Design and Implementation (NSDI'16)."},{"key":"e_1_3_2_2_51_1","volume-title":"Mixed integer linear programming formulation techniques. Siam Review","author":"Vielma Juan Pablo","year":"2015","unstructured":"Juan Pablo Vielma . 2015. Mixed integer linear programming formulation techniques. Siam Review ( 2015 ). Juan Pablo Vielma. 2015. Mixed integer linear programming formulation techniques. Siam Review (2015)."},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386367.3432588"},{"key":"e_1_3_2_2_53_1","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation.","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao , Romil Bhardwaj , Ramachandran Ramjee , Muthian Sivathanu , Nipun Kwatra , Zhenhua Han , Pratyush Patel , Xuan Peng , Hanyu Zhao , Quanlu Zhang , 2018 . Gandiva: Introspective cluster scheduling for deep learning . In 13th USENIX Symposium on Operating Systems Design and Implementation. Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, et al. 2018. Gandiva: Introspective cluster scheduling for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation."},{"key":"e_1_3_2_2_54_1","volume-title":"USENIX Symposium on Operating Systems Design and Implementation.","author":"Xiao Wencong","year":"2020","unstructured":"Wencong Xiao , Shiru Ren , Yong Li , Yang Zhang , Pengyang Hou , Zhi Li , Yihui Feng , Wei Lin , and Yangqing Jia . 2020 . AntMan: Dynamic Scaling on {GPU} Clusters for Deep Learning . In USENIX Symposium on Operating Systems Design and Implementation. Wencong Xiao, Shiru Ren, Yong Li, Yang Zhang, Pengyang Hou, Zhi Li, Yihui Feng, Wei Lin, and Yangqing Jia. 2020. AntMan: Dynamic Scaling on {GPU} Clusters for Deep Learning. In USENIX Symposium on Operating Systems Design and Implementation."},{"key":"e_1_3_2_2_56_1","volume-title":"USENIX Symposium on Operating Systems Design and Implementation.","author":"Zhao Hanyu","year":"2020","unstructured":"Hanyu Zhao , Zhenhua Han , Zhi Yang , Quanlu Zhang , Fan Yang , Lidong Zhou , Mao Yang , Francis C.M. Lau , Yuqi Wang , Yifan Xiong , and Bin Wang . 2020 . HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees . In USENIX Symposium on Operating Systems Design and Implementation. Hanyu Zhao, Zhenhua Han, Zhi Yang, Quanlu Zhang, Fan Yang, Lidong Zhou, Mao Yang, Francis C.M. Lau, Yuqi Wang, Yifan Xiong, and Bin Wang. 2020. HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees. In USENIX Symposium on Operating Systems Design and Implementation."}],"event":{"name":"SoCC '21: ACM Symposium on Cloud Computing","location":"Seattle WA USA","acronym":"SoCC '21","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the ACM Symposium on Cloud Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472883.3486978","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472883.3486978","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:57Z","timestamp":1750191117000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472883.3486978"}},"subtitle":["A Novel Deadline-aware Scheduler for Deep Learning Training Jobs"],"short-title":[],"issued":{"date-parts":[[2021,11]]},"references-count":53,"alternative-id":["10.1145\/3472883.3486978","10.1145\/3472883"],"URL":"https:\/\/doi.org\/10.1145\/3472883.3486978","relation":{},"subject":[],"published":{"date-parts":[[2021,11]]},"assertion":[{"value":"2021-11-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}