{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T00:43:49Z","timestamp":1759797829796,"version":"build-2065373602"},"reference-count":116,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,3,31]]},"abstract":"<jats:p>Over the last ten years, the search for efficient scheduling algorithms that can cope with heterogeneous workload demands on large (warehouse) scale computers has reached a feverish tempo. We focus on examining scheduling techniques for highly parallelizable jobs on warehouse-scale computers through the lens of basic results in relevant fundamental theories. The objective of this survey is to connect the disparate scheduling ideas and approaches in the literature under a loose framework of mathematical results that can be used to compare superficially different scheduling methodologies under a common goal. As the mathematical problem is NP-Hard in general, we do not emphasize rigorous mathematical proof, rather, we advocate for the use of mathematical results to guide intuition. We provide readers with some basic tools to use in navigating the fragmented research around job scheduling for distributed applications. We also highlight some common misunderstandings of fundamental theory in the literature that are skewing results and may be limiting research progress.<\/jats:p>","DOI":"10.1145\/3766543","type":"journal-article","created":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T10:37:42Z","timestamp":1757587062000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Survey of Fundamental Principles and Analysis for Job Scheduling on Warehouse-Scale Computers"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-0307-2032","authenticated-orcid":false,"given":"Kevin","family":"Exton","sequence":"first","affiliation":[{"name":"School of Computing and Information Systems, The University of Melbourne","place":["Melbourne, Australia"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2831-8526","authenticated-orcid":false,"given":"Maria Rodriguez","family":"Read","sequence":"additional","affiliation":[{"name":"School of Computing and Information Systems, The University of Melbourne","place":["Melbourne, Australia"]}]}],"member":"320","published-online":{"date-parts":[[2025,10,6]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1007\/978-3-030-33495-6_13","volume-title":"Proceedings of the High-Performance Computing and Big Data Analysis","author":"Aghdashi Arman","year":"2019","unstructured":"Arman Aghdashi and Seyedeh Leili Mirtaheri. 2019. A survey on load balancing in cloud systems for big data applications. In Proceedings of the High-Performance Computing and Big Data Analysis. Lucio Grandinetti, Seyedeh Leili Mirtaheri, and Reza Shahbazian (Eds.), Springer International Publishing, Cham, 156\u2013173. DOI:10.1007\/978-3-030-33495-6_13"},{"key":"e_1_3_2_3_2","first-page":"923","volume-title":"Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC 18)","author":"Akkus Istemi Ekin","year":"2018","unstructured":"Istemi Ekin Akkus, Ruichuan Chen, Ivica Rimac, Manuel Stein, Klaus Satzke, Andre Beck, Paarijaat Aditya, and Volker Hilt. 2018. SAND: Towards high-performance serverless computing. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, MA, 923\u2013935."},{"key":"e_1_3_2_4_2","first-page":"185","volume-title":"Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (nsdi\u201913)","author":"Ananthanarayanan Ganesh","year":"2013","unstructured":"Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective straggler mitigation: Attack of the clones. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (nsdi\u201913). USENIX Association, USA, 185\u2013198."},{"key":"e_1_3_2_5_2","first-page":"289","volume-title":"Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14)","author":"Ananthanarayanan Ganesh","year":"2014","unstructured":"Ganesh Ananthanarayanan, Michael Chien-Chun Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, and Minlan Yu. 2014. GRASS: Trimming stragglers in approximation analytics. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 289\u2013302."},{"key":"e_1_3_2_6_2","first-page":"265","volume-title":"Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201910)","author":"Ananthanarayanan Ganesh","year":"2010","unstructured":"Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the outliers in map-reduce clusters using Mantri. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201910). USENIX Association, USA, 265\u2013278."},{"key":"e_1_3_2_7_2","unstructured":"The ZeroMQ authors. 2024. ZeroMQ: An open-source universal messaging library. (2024). Retrieved June 10 2024 from https:\/\/zeromq.org"},{"key":"e_1_3_2_8_2","first-page":"593","volume-title":"Proceedings of the 26th Annual ACM Symposium on Theory of Computing\u2014STOC \u201994","author":"Azar Yossi","year":"1994","unstructured":"Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. 1994. Balanced allocations (extended abstract). In Proceedings of the 26th Annual ACM Symposium on Theory of Computing\u2014STOC \u201994. ACM Press, Montreal, Quebec, Canada, 593\u2013602. DOI:10.1145\/195058.195412"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.2307\/1427640"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-01761-2"},{"issue":"5","key":"e_1_3_2_11_2","first-page":"2270","article-title":"Review and analysis of straggler handling techniques","volume":"7","author":"Bhandare Ashwin","year":"2016","unstructured":"Ashwin Bhandare, J. George, S. Deshpande, and Yash Karle. 2016. Review and analysis of straggler handling techniques. International Journal of Computer Science and Information Technologies 7, 5 (2016), 2270\u20132276.","journal-title":"International Journal of Computer Science and Information Technologies"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3233712"},{"key":"e_1_3_2_13_2","first-page":"285","volume-title":"Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201914)","author":"Boutin Eric","year":"2014","unstructured":"Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201914). USENIX Association, USA, 285\u2013300."},{"key":"e_1_3_2_14_2","unstructured":"Broadcom. 2024. RabbitMQ: One broker to queue them all. (2024). Retrieved June 10 2024 from https:\/\/www.rabbitmq.com"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367519"},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","DOI":"10.21236\/ADA555881","volume-title":"Design Insights for MapReduce from Diverse Production Workloads | EECS at UC Berkeley","author":"Chen Yanpei","year":"2012","unstructured":"Yanpei Chen, Sara Alspaugh, and Randy H. Katz. 2012. Design Insights for MapReduce from Diverse Production Workloads | EECS at UC Berkeley. Technical Report. University of California, Berkeley. Retrieved from https:\/\/www2.eecs.berkeley.edu\/Pubs\/TechRpts\/2012\/EECS-2012-17.html"},{"key":"e_1_3_2_17_2","first-page":"31","volume-title":"Proceedings of the 11th ACM Workshop on Hot Topics in Networks (HotNets-XI)","author":"Chowdhury Mosharaf","year":"2012","unstructured":"Mosharaf Chowdhury and Ion Stoica. 2012. Coflow: A networking abstraction for cluster applications. In Proceedings of the 11th ACM Workshop on Hot Topics in Networks (HotNets-XI). Association for Computing Machinery, New York, NY, USA, 31\u201336. DOI:10.1145\/2390231.2390237"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2829988.2787480"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/2740070.2626315"},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1145\/2806777.2806843","volume-title":"Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC \u201915)","author":"Coppa Emilio","year":"2015","unstructured":"Emilio Coppa and Irene Finocchi. 2015. On data skewness, stragglers, and MapReduce progress indicators. In Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC \u201915). Association for Computing Machinery, New York, NY, USA, 139\u2013152. DOI:10.1145\/2806777.2806843"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2023.05.017"},{"issue":"2","key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1145\/2408776.2408794","article-title":"The tail at scale: Software techniques that tolerate latency variability are vital to building responsive large-scale Web services","volume":"56","author":"Dean J.","year":"2013","unstructured":"J. Dean and L. A. Barroso. 2013. The tail at scale: Software techniques that tolerate latency variability are vital to building responsive large-scale Web services. Communications of the ACM 56, 2 (2013), 74\u201380.","journal-title":"Communications of the ACM"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_2_24_2","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1145\/2987550.2987563","volume-title":"Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC \u201916)","author":"Delgado Pamela","year":"2016","unstructured":"Pamela Delgado, Diego Didona, Florin Dinu, and Willy Zwaenepoel. 2016. Job-aware scheduling in eagle: Divide and stick to your probes. In Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC \u201916). Association for Computing Machinery, New York, NY, USA, 497\u2013509. DOI:10.1145\/2987550.2987563"},{"key":"e_1_3_2_25_2","first-page":"499","volume-title":"Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC \u201915)","author":"Delgado Pamela","year":"2015","unstructured":"Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. 2015. Hawk: Hybrid datacenter scheduling. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC \u201915). USENIX Association, USA, 499\u2013510."},{"key":"e_1_3_2_26_2","first-page":"127","volume-title":"Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS \u201914)","author":"Delimitrou Christina","year":"2014","unstructured":"Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS \u201914). Association for Computing Machinery, New York, NY, USA, 127\u2013144. DOI:10.1145\/2541940.2541941"},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1145\/2806777.2806779","volume-title":"Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC \u201915)","author":"Delimitrou Christina","year":"2015","unstructured":"Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. 2015. Tarcil: Reconciling scheduling speed and quality in large shared clusters. In Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC \u201915). Association for Computing Machinery, New York, NY, USA, 97\u2013110. DOI:10.1145\/2806777.2806779"},{"key":"e_1_3_2_28_2","first-page":"2403","volume-title":"Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT)","author":"Dutta Sanghamitra","year":"2017","unstructured":"Sanghamitra Dutta, Viveck Cadambe, and Pulkit Grover. 2017. Coded convolution for parallel and distributed computing within a deadline. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT). IEEE, USA, 2403\u20132407. DOI:10.1109\/ISIT.2017.8006960"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","unstructured":"S. Dutta V. Cadambe and P. Grover. 2017. \u201cShort-Dot\u201d: Computing large linear transforms distributedly using coded short Dot products. In IEEE Transactions on Information Theory 65 10 (2017) 6171\u20136193. DOI:10.1109\/TIT.2019.2927558","DOI":"10.1109\/TIT.2019.2927558"},{"key":"e_1_3_2_30_2","unstructured":"A. K. Erlang. 1917. Solution of some problems in the theory of probabilities of significance in automatic telephone exchanges. The Post Office Engineer\u2019s Journal 10 (1917) 189\u2013197."},{"issue":"5","key":"e_1_3_2_31_2","doi-asserted-by":"crossref","first-page":"1041","DOI":"10.1137\/0144074","article-title":"Two parallel queues created by arrivals with two demands I","volume":"44","author":"Flatto L.","year":"1984","unstructured":"L. Flatto and S. Hahn. 1984. Two parallel queues created by arrivals with two demands I. SIAM Journal on Applied Mathematics 44, 5 (1984), 1041\u20131053.","journal-title":"SIAM Journal on Applied Mathematics"},{"key":"e_1_3_2_32_2","first-page":"363","volume-title":"Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation","author":"Fouladi Sadjad","year":"2017","unstructured":"Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, fast and slow: Low-latency video processing using thousands of tiny threads. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, USA, 363\u2013376."},{"key":"e_1_3_2_33_2","unstructured":"Apache Software Foundation. 2023. Apache OpenWhisk. (2023). Retrieved December 19 2023 from https:\/\/openwhisk.apache.org"},{"key":"e_1_3_2_34_2","unstructured":"Apache Software Foundation. 2024. Apache Hadoop. (2024). Retrieved June 13 2024 from https:\/\/hadoop.apache.org\/"},{"key":"e_1_3_2_35_2","unstructured":"Apache Software Foundation. 2024. Apache Kafka. (2024). Retrieved February 12 2024 from https:\/\/kafka.apache.org\/"},{"key":"e_1_3_2_36_2","unstructured":"Apache Software Foundation. 2024. CouchDB. (2024). Retrieved June 12 2024 from https:\/\/couchdb.apache.org\/"},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1145\/2745844.2745873","volume-title":"Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS \u201915)","author":"Gardner Kristen","year":"2015","unstructured":"Kristen Gardner, Samuel Zbarsky, Sherwin Doroudi, Mor Harchol-Balter, and Esa Hyytia. 2015. Reducing latency via redundant requests: Exact analysis. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS \u201915). Association for Computing Machinery, New York, NY, USA, 347\u2013360. DOI:10.1145\/2745844.2745873"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSC.2016.2611578"},{"key":"e_1_3_2_39_2","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1145\/2465351.2465387","volume-title":"Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys \u201913)","author":"Ghodsi Ali","year":"2013","unstructured":"Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2013. Choosy: Max-min fair sharing for datacenter jobs with constraints. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys \u201913). Association for Computing Machinery, New York, NY, USA, 365\u2013378. DOI:10.1145\/2465351.2465387"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-020-03241-x"},{"key":"e_1_3_2_41_2","first-page":"99","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Gog Ionel","year":"2016","unstructured":"Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert N. M. Watson, and Steven Hand. 2016. Firmament: Fast, centralized cluster scheduling at scale. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 99\u2013115."},{"key":"e_1_3_2_42_2","volume-title":"Fundamentals of queueing theory (3 ed.)","author":"Gross Donald","year":"1998","unstructured":"Donald Gross and Carl M. Harris. 1998. Fundamentals of queueing theory (3 ed.). Wiley, New York."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2587641"},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1007\/978-3-540-45232-4_10","volume-title":"Proceedings of the Computer Performance Evaluation. Modelling Techniques and Tools","author":"Harrison Peter","year":"2003","unstructured":"Peter Harrison and Soraya Zertal. 2003. Queueing models with maxima of service times. In Proceedings of the Computer Performance Evaluation. Modelling Techniques and Tools. Peter Kemper and William H. Sanders (Eds.), Springer, Berlin,152\u2013168. DOI:10.1007\/978-3-540-45232-4_10"},{"key":"e_1_3_2_45_2","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1007\/978-981-16-8721-1_39","volume-title":"Proceedings of the Micro-Electronics and Telecommunication Engineering","author":"Hasan Balqees Talal","year":"2022","unstructured":"Balqees Talal Hasan and Dhuha Basheer Abdullah. 2022. A survey of scheduling tasks in big data: Apache spark. In Proceedings of the Micro-Electronics and Telecommunication Engineering. Devendra Kumar Sharma, Sheng-Lung Peng, Rohit Sharma, and Dmitry A. Zaitsev (Eds.), Springer Nature, Singapore, 405\u2013414. DOI:10.1007\/978-981-16-8721-1_39"},{"key":"e_1_3_2_46_2","first-page":"295","volume-title":"Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI\u201911)","author":"Hindman Benjamin","year":"2011","unstructured":"Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI\u201911). USENIX Association, USA, 295\u2013308."},{"key":"e_1_3_2_47_2","first-page":"127","volume-title":"Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM \u201912)","author":"Hong Chi-Yao","year":"2012","unstructured":"Chi-Yao Hong, Matthew Caesar, and P. Brighten Godfrey. 2012. Finishing flows quickly with preemptive scheduling. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM \u201912). Association for Computing Machinery, New York, NY, USA, 127\u2013138. DOI:10.1145\/2342356.2342389"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1287\/opre.9.6.841"},{"key":"e_1_3_2_49_2","first-page":"2766","volume-title":"Proceedings of the 2012 IEEE International Symposium on Information Theory Proceedings","author":"Huang Longbo","year":"2012","unstructured":"Longbo Huang, Sameer Pawar, Hao Zhang, and Kannan Ramchandran. 2012. Codes can reduce queueing delay in data centers. In Proceedings of the 2012 IEEE International Symposium on Information Theory Proceedings. IEEE, USA, 2766\u20132770. DOI:10.1109\/ISIT.2012.6284026"},{"key":"e_1_3_2_50_2","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1145\/3335484.3335493","volume-title":"Proceedings of the 4th International Conference on Big Data and Computing (ICBDC \u201919)","author":"Huang Xiaohan","year":"2019","unstructured":"Xiaohan Huang, Chunlin Li, and Youlong Luo. 2019. Optimized speculative execution strategy for different workload levels in heterogeneous spark cluster. In Proceedings of the 4th International Conference on Big Data and Computing (ICBDC \u201919). Association for Computing Machinery, New York, NY, USA, 6\u201310. DOI:10.1145\/3335484.3335493"},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1145\/3106989.3107004","volume-title":"Proceedings of the 1st Asia-Pacific Workshop on Networking (APNet \u201917)","author":"Huang Xin Sunny","year":"2017","unstructured":"Xin Sunny Huang and T. S. Eugene Ng. 2017. Exploiting inter-flow relationship for coflow placement in datacenters. In Proceedings of the 1st Asia-Pacific Workshop on Networking (APNet \u201917). Association for Computing Machinery, New York, NY, USA, 113\u2013119. DOI:10.1145\/3106989.3107004"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/1272998.1273005"},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1145\/1629575.1629601","volume-title":"Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP \u201909)","author":"Isard Michael","year":"2009","unstructured":"Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: Fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP \u201909). Association for Computing Machinery, New York, NY, USA, 261\u2013276. DOI:10.1145\/1629575.1629601"},{"key":"e_1_3_2_54_2","first-page":"237","volume-title":"Proceedings of the 2023 IEEE International Conference on Big Data (BigData)","author":"Liu J.","year":"2023","unstructured":"J. Liu, Y. Lao, Y. Mao, and R. Buyya. 2023. Sailfish: A dependency-aware and resource efficient scheduling for low latency in clouds. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData). IEEE, Sorrento, Italy, 237\u2013246. DOI:10.1109\/BigData59044.2023.10386947"},{"key":"e_1_3_2_55_2","first-page":"152","volume-title":"Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS \u201921)","author":"Jia Zhipeng","year":"2021","unstructured":"Zhipeng Jia and Emmett Witchel. 2021. Nightcore: Efficient and scalable serverless computing for latency-sensitive, interactive microservices. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS \u201921). Association for Computing Machinery, New York, NY, USA, 152\u2013166. DOI:10.1145\/3445814.3446701"},{"key":"e_1_3_2_56_2","volume-title":"Urn Models and Their Application : An Approach to Modern Discrete Probability Theory","author":"Johnson Norman Lloyd","year":"1977","unstructured":"Norman Lloyd Johnson and Samuel Kotz. 1977. Urn Models and Their Application : An Approach to Modern Discrete Probability Theory. Wiley, New York. Publisher: Wiley."},{"key":"e_1_3_2_57_2","first-page":"117","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Jyothi Sangeetha Abdu","year":"2016","unstructured":"Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Inigo Goiri, Subru Krishnan, Janardhan Kulkarni, et\u00a0al. 2016. Morpheus: Towards automated SLOs for enterprise clusters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 117\u2013134."},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1007\/978-3-319-96983-1_13","volume-title":"Proceedings of the Euro-Par 2018: Parallel Processing","author":"Khelghatdoust Mansour","year":"2018","unstructured":"Mansour Khelghatdoust and Vincent Gramoli. 2018. Peacock: Probe-based scheduling of jobs by rotating between elastic queues. In Proceedings of the Euro-Par 2018: Parallel Processing. Marco Aldinucci, Luca Padovani, and Massimo Torquati (Eds.), Springer International Publishing, Cham, 178\u2013191. DOI:10.1007\/978-3-319-96983-1_13"},{"key":"e_1_3_2_59_2","volume-title":"Queuing Systems Volume II: Computer Applications","author":"Kleinrock Leonard","year":"1976","unstructured":"Leonard Kleinrock. 1976. Queuing Systems Volume II: Computer Applications. John Wiley and Sons, USA."},{"key":"e_1_3_2_60_2","volume-title":"Distributed Inference With Straggler Mitigation","author":"Krishna Lolla Sai","year":"2019","unstructured":"Lolla Sai Krishna. 2019. Distributed Inference With Straggler Mitigation. Master\u2019s thesis. Indian Institute of Technology, Hyderabad."},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.14257\/ijgdc.2014.7.4.13"},{"key":"e_1_3_2_62_2","first-page":"12","volume-title":"Proceedings of the 23rd UK Performance Engineering Workshop","author":"Lebrecht Abigail S.","year":"2007","unstructured":"Abigail S. Lebrecht and William J. Knottenbelt. 2007. Response time approximations in fork-join queues. In Proceedings of the 23rd UK Performance Engineering Workshop. Edge Hill University, UK, 12\u201320."},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","unstructured":"K. Lee M. Lam R. Pedarsani D. Papailiopoulos and K. Ramchandran. 2018. Speeding Up Distributed Machine Learning Using Codes. IEEETransactionson Information Theory 64 3 (2018) 1514\u20131529. DOI:10.1109\/TIT.2017.2736066","DOI":"10.1109\/TIT.2017.2736066"},{"key":"e_1_3_2_64_2","first-page":"2418","volume-title":"Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT)","author":"Lee Kangwook","year":"2017","unstructured":"Kangwook Lee, Changho Suh, and Kannan Ramchandran. 2017. High-dimensional coded matrix multiplication. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT). IEEE, USA, 2418\u20132422. DOI:10.1109\/ISIT.2017.8006963"},{"key":"e_1_3_2_65_2","first-page":"311","volume-title":"Proceedings of the 2011 IEEE 8th International Conference on e-Business Engineering","author":"Lei Lei","year":"2011","unstructured":"Lei Lei, Tianyu Wo, and Chunming Hu. 2011. CREST: Towards fast speculation of straggler tasks in mapreduce. In Proceedings of the 2011 IEEE 8th International Conference on e-Business Engineering. IEEE, USA, 311\u2013316. DOI:10.1109\/ICEBE.2011.37"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1201\/9780203489802"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","unstructured":"Songze Li Mohammad Ali Maddah-Ali Qian Yu and A. Salman Avestimehr. 2018. A fundamental tradeoff between computation and communication in distributed computing. IEEETransactions on Information Theory 64 1 (2018) 109\u2013128. DOI:10.1109\/TIT.2017.2756959","DOI":"10.1109\/TIT.2017.2756959"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12083-017-0590-4"},{"key":"e_1_3_2_69_2","first-page":"237","volume-title":"Proceedings of the 2023 IEEE International Conference on Big Data (BigData)","author":"Liu Jinwei","year":"2023","unstructured":"Jinwei Liu, Yingjie Lao, Ying Mao, and Rajkumar Buyya. 2023. Sailfish: A dependency-aware and resource efficient scheduling for low latency in clouds. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData). IEEE, USA, 237\u2013246. DOI:10.1109\/BigData59044.2023.10386947"},{"key":"e_1_3_2_70_2","first-page":"1","volume-title":"Proceedings of the 11th European Conference on Computer Systems","author":"Lozi Jean-Pierre","year":"2016","unstructured":"Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Qu\u00e9ma, and Alexandra Fedorova. 2016. The linux scheduler: A decade of wasted cores. In Proceedings of the 11th European Conference on Computer Systems. ACM, London United Kingdom, 1\u201316. DOI:10.1145\/2901318.2901326"},{"key":"e_1_3_2_71_2","first-page":"595","volume-title":"Proceedings of the 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","author":"Merzky Andre","year":"2022","unstructured":"Andre Merzky, Matteo Turilli, and Shantenu Jha. 2022. RAPTOR: Ravenous throughput computing. In Proceedings of the 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, USA, 595\u2013604. DOI:10.1109\/CCGrid54584.2022.00069"},{"key":"e_1_3_2_72_2","volume-title":"The Power of Two Choices in Randomized Load Balancing","author":"Mitenmacher Michael David","year":"1996","unstructured":"Michael David Mitenmacher. 1996. The Power of Two Choices in Randomized Load Balancing. Ph.D. Dissertation. University of California, Berkeley."},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/71.963420"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2020.3013548"},{"key":"e_1_3_2_75_2","first-page":"137","volume-title":"Proceedings of the 2015 IEEE 31st International Conference on Data Engineering","author":"Nasir Muhammad Anis Uddin","year":"2015","unstructured":"Muhammad Anis Uddin Nasir, Gianmarco De Francisci Morales, David Garc\u00eda-Soriano, Nicolas Kourtellis, and Marco Serafini. 2015. The power of both choices: Practical load balancing for distributed stream processing engines. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. IEEE, USA, 137\u2013148. DOI:10.1109\/ICDE.2015.7113279"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2021.3091684"},{"key":"e_1_3_2_77_2","first-page":"14","volume-title":"Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems (HotOS\u201913)","author":"Ousterhout Kay","year":"2013","unstructured":"Kay Ousterhout, Aurojit Panda, Joshua Rosen, Shivaram Venkataraman, Reynold Xin, Sylvia Ratnasamy, Scott Shenker, and Ion Stoica. 2013. The case for tiny tasks in compute clusters. In Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems (HotOS\u201913). USENIX Association, USA, 14."},{"key":"e_1_3_2_78_2","first-page":"69","volume-title":"Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP \u201913)","author":"Ousterhout Kay","year":"2013","unstructured":"Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP \u201913). Association for Computing Machinery, New York, NY, USA, 69\u201384. DOI:10.1145\/2517349.2522716"},{"key":"e_1_3_2_79_2","first-page":"73","volume-title":"Proceedings of the 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS)","author":"Ouyang Xue","year":"2017","unstructured":"Xue Ouyang, Changjian Wang, Renyu Yang, Guogui Yang, Paul Townend, and Jie Xu. 2017. ML-NA: A machine learning based node performance analyzer utilizing straggler statistics. In Proceedings of the 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS). IEEE, USA, 73\u201380. DOI:10.1109\/ICPADS.2017.00021"},{"key":"e_1_3_2_80_2","first-page":"47","volume-title":"Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Panda Biswaranjan","year":"2019","unstructured":"Biswaranjan Panda, Deepthi Srinivasan, Huan Ke, Karan Gupta, Vinayak Khot, and Haryadi S. Gunawi. 2019. IASO: A fail-slow detection and mitigation framework for distributed storage services. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 47\u201362."},{"key":"e_1_3_2_81_2","volume-title":"Computer Organization and Design: The Hardware\/Software Interface","author":"Patterson David A.","year":"2011","unstructured":"David A. Patterson and John L. Hennessy. 2011. Computer Organization and Design: The Hardware\/Software Interface. Elsevier Science and Technology, St. Louis, UNITED STATES."},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/3328740"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4614-2361-4"},{"key":"e_1_3_2_84_2","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1109\/SOSE58276.2023.00024","volume-title":"Proceedings of the 2023 IEEE International Conference on Service-Oriented System Engineering (SOSE)","author":"Pujol Victor Casamayor","year":"2023","unstructured":"Victor Casamayor Pujol, Andrea Morichetta, and Stefan Nastic. 2023. Intelligent sampling: A novel approach to optimize workload scheduling in large-scale heterogeneous computing continuum. In Proceedings of the 2023 IEEE International Conference on Service-Oriented System Engineering (SOSE). IEEE, USA, 140\u2013149. DOI:10.1109\/SOSE58276.2023.00024"},{"key":"e_1_3_2_85_2","first-page":"218","volume-title":"Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS \u201903)","author":"Rai Idris A.","year":"2003","unstructured":"Idris A. Rai, Guillaume Urvoy-Keller, and Ernst W. Biersack. 2003. Analysis of LAS scheduling for job size distributions with high variance. In Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS \u201903). Association for Computing Machinery, New York, NY, USA, 218\u2013228. DOI:10.1145\/781027.781055"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2020.3029396"},{"key":"e_1_3_2_87_2","first-page":"379","volume-title":"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM \u201915)","author":"Ren Xiaoqi","year":"2015","unstructured":"Xiaoqi Ren, Ganesh Ananthanarayanan, Adam Wierman, and Minlan Yu. 2015. Hopper: Decentralized speculation-aware cluster scheduling at scale. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM \u201915). Association for Computing Machinery, New York, NY, USA, 379\u2013392. DOI:10.1145\/2785956.2787481"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.2307\/3214245"},{"key":"e_1_3_2_89_2","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1145\/2465351.2465386","volume-title":"Proceedings of the 8th ACM European Conference on Computer Systems","author":"Schwarzkopf Malte","year":"2013","unstructured":"Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. 2013. Omega: Flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems. ACM, Prague Czech Republic, 351\u2013364. DOI:10.1145\/2465351.2465386"},{"key":"e_1_3_2_90_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01149176"},{"key":"e_1_3_2_91_2","first-page":"3368","volume-title":"Proceedings of the 34th International Conference on Machine Learning","author":"Tandon Rashish","year":"2017","unstructured":"Rashish Tandon, Qi Lei, Alexandros G. Dimakis, and Nikos Karampatziakis. 2017. Gradient coding: Avoiding stragglers in distributed learning. In Proceedings of the 34th International Conference on Machine Learning. JMLR, USA, 3368\u20133376."},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.1145\/2628913"},{"key":"e_1_3_2_93_2","volume-title":"Proceedings of the 15th European Conference on Computer Systems (EuroSys \u201920)","author":"Tirmazi Muhammad","year":"2020","unstructured":"Muhammad Tirmazi, Adam Barker, Nan Deng, Md E. Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. 2020. Borg: The next Generation. In Proceedings of the 15th European Conference on Computer Systems (EuroSys \u201920). Association for Computing Machinery, New York, NY, USA, Article 30, 14 pages. DOI:10.1145\/3342195.3387517"},{"key":"e_1_3_2_94_2","volume-title":"Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC \u201912)","author":"Tumanov Alexey","year":"2012","unstructured":"Alexey Tumanov, James Cipar, Gregory R. Ganger, and Michael A. Kozuch. 2012. Alsched: Algebraic scheduling of mixed workloads in heterogeneous clouds. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC \u201912). Association for Computing Machinery, New York, NY, USA, Article 25, 7 pages. DOI:10.1145\/2391229.2391254"},{"key":"e_1_3_2_95_2","volume-title":"Proceedings of the 11th European Conference on Computer Systems (EuroSys \u201916)","author":"Tumanov Alexey","year":"2016","unstructured":"Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A. Kozuch, Mor Harchol-Balter, and Gregory R. Ganger. 2016. TetriSched: Global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Proceedings of the 11th European Conference on Computer Systems (EuroSys \u201916). Association for Computing Machinery, New York, NY, USA, Article 35, 16 pages. DOI:10.1145\/2901318.2901355"},{"key":"e_1_3_2_96_2","first-page":"333","volume-title":"Proceedings of the 19th European Conference on Computer Systems (EuroSys \u201924)","author":"Udayashankar Sreeharsha","year":"2024","unstructured":"Sreeharsha Udayashankar, Ashraf Abdel-Hadi, Ali Mashtizadeh, and Samer Al-Kiswany. 2024. Draconis: Network-accelerated scheduling for microsecond-scale workloads. In Proceedings of the 19th European Conference on Computer Systems (EuroSys \u201924). Association for Computing Machinery, New York, NY, USA, 333\u2013348. DOI:10.1145\/3627703.3650060"},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","DOI":"10.1145\/79173.79181"},{"key":"e_1_3_2_98_2","volume-title":"Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC \u201913)","author":"Vavilapalli Vinod Kumar","year":"2013","unstructured":"Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, et\u00a0al. 2013. Apache hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC \u201913). Association for Computing Machinery, New York, NY, USA, Article 5, 16 pages. DOI:10.1145\/2523616.2523633"},{"key":"e_1_3_2_99_2","first-page":"301","volume-title":"Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)","author":"Venkataraman Shivaram","year":"2014","unstructured":"Shivaram Venkataraman, Aurojit Panda, Ganesh Ananthanarayanan, Michael J. Franklin, and Ion Stoica. 2014. The power of choice in data-aware cluster scheduling. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 301\u2013316."},{"key":"e_1_3_2_100_2","first-page":"177","volume-title":"Proceedings of the 2009 6th International Conference on Information Technology: New Generations","author":"Vladu\u0161ic Daniel","year":"2009","unstructured":"Daniel Vladu\u0161ic, Ale\u0161 Cernivec, and Bo\u0161tjan Slivnik. 2009. Improving job scheduling in GRID environments with use of simple machine learning methods. In Proceedings of the 2009 6th International Conference on Information Technology: New Generations. IEEE, Las Vegas, Nevada, 177\u2013182. DOI:10.1109\/ITNG.2009.228"},{"key":"e_1_3_2_101_2","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1145\/2591971.2592042","volume-title":"Proceedings of the 2014 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS \u201914)","author":"Wang Da","year":"2014","unstructured":"Da Wang, Gauri Joshi, and Gregory Wornell. 2014. Efficient task replication for fast response times in parallel computation. In Proceedings of the 2014 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS \u201914). Association for Computing Machinery, New York, NY, USA, 599\u2013600. DOI:10.1145\/2591971.2592042"},{"key":"e_1_3_2_102_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2024.110429"},{"key":"e_1_3_2_103_2","first-page":"324","volume-title":"Proceedings of the 2021 IEEE International Conference on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking (ISPA\/BDCloud\/SocialCom\/SustainCom)","author":"Wang Yuzhao","year":"2021","unstructured":"Yuzhao Wang, Junqing Yu, and Zhibin Yu. 2021. Treator: A fast centralized cluster scheduling at scale based on b+ tree and BSP. In Proceedings of the 2021 IEEE International Conference on Parallel and Distributed Processing with Applications, Big Data and Cloud Computing, Sustainable Computing and Communications, Social Computing and Networking (ISPA\/BDCloud\/SocialCom\/SustainCom). IEEE, USA, 324\u2013335. DOI:10.1109\/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00053"},{"key":"e_1_3_2_104_2","doi-asserted-by":"publisher","DOI":"10.1631\/FITEE.2100298"},{"key":"e_1_3_2_105_2","first-page":"310","volume-title":"Proceedings of the 2020 20th IEEE\/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)","author":"Yabuuchi Hidehito","year":"2020","unstructured":"Hidehito Yabuuchi and Takahiro Shinagawa. 2020. Multi-resource Low-latency Cluster Scheduling without Execution Time Estimation. In Proceedings of the 2020 20th IEEE\/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). IEEE, Melbourne, Australia, 310\u2013319. DOI:10.1109\/CCGrid49817.2020.00-62"},{"key":"e_1_3_2_106_2","series-title":"Lecture Notes in Computer Science","first-page":"44","volume-title":"Proceedings of the Job Scheduling Strategies for Parallel Processing.","author":"Yoo Andy B.","year":"2003","unstructured":"Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple linux utility for resource management. In Proceedings of the Job Scheduling Strategies for Parallel Processing.Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn (Eds.), Lecture Notes in Computer Science, Springer, Berlin,44\u201360. DOI:10.1007\/10968987_3"},{"key":"e_1_3_2_107_2","first-page":"494","volume-title":"Proceedings of the 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","author":"Yu Qian","year":"2017","unstructured":"Qian Yu, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr. 2017. Coded fourier transform. In Proceedings of the 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE,USA, 494\u2013501. DOI:10.1109\/ALLERTON.2017.8262778"},{"key":"e_1_3_2_108_2","first-page":"4406","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS\u201917)","author":"Yu Qian","year":"2017","unstructured":"Qian Yu, Mohammad Ali Maddah-Ali, and A. Salman Avestimehr. 2017. Polynomial codes: An optimal design for high-dimensional coded matrix multiplication. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS\u201917). Curran Associates Inc., Red Hook, NY, USA, 4406\u20134416."},{"key":"e_1_3_2_109_2","volume-title":"Job Scheduling for Multi-User MapReduce Clusters","author":"Zaharia Matei","year":"2009","unstructured":"Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2009. Job Scheduling for Multi-User MapReduce Clusters. Technical Report UCB\/EECS-2009-55. University of California, Berkeley, California."},{"key":"e_1_3_2_110_2","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1145\/1755913.1755940","volume-title":"Proceedings of the 5th European Conference on Computer Systems","author":"Zaharia Matei","year":"2010","unstructured":"Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European Conference on Computer Systems. ACM, Paris France, 265\u2013278. DOI:10.1145\/1755913.1755940"},{"key":"e_1_3_2_111_2","first-page":"2","volume-title":"Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI\u201912)","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI\u201912). USENIX Association, USA, 2."},{"key":"e_1_3_2_112_2","first-page":"29","volume-title":"Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201908)","author":"Zaharia Matei","year":"2008","unstructured":"Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. 2008. Improving mapreduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201908). USENIX Association, USA, 29\u201342."},{"key":"e_1_3_2_113_2","doi-asserted-by":"crossref","first-page":"424","DOI":"10.1109\/INFOCOM.2015.7218408","volume-title":"Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM)","author":"Zhao Yangming","year":"2015","unstructured":"Yangming Zhao, Kai Chen, Wei Bai, Minlan Yu, Chen Tian, Yanhui Geng, Yiming Zhang, Dan Li, and Sheng Wang. 2015. Rapier: Integrating routing and scheduling for coflow-aware data center networks. In Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, USA, 424\u2013432. DOI:10.1109\/INFOCOM.2015.7218408"},{"key":"e_1_3_2_114_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2020.3037064"},{"key":"e_1_3_2_115_2","first-page":"170","volume-title":"Proceedings of the 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","author":"Zhao Yuxuan","year":"2022","unstructured":"Yuxuan Zhao and Alexandru Uta. 2022. Tiny autoscalers for tiny workloads: Dynamic CPU allocation for serverless functions. In Proceedings of the 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, Piscataway, NJ USA, 170\u2013179. DOI:10.1109\/CCGrid54584.2022.00026"},{"key":"e_1_3_2_116_2","volume-title":"Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201920)","author":"Zhu Hang","year":"2020","unstructured":"Hang Zhu, Kostis Kaffes, Zixu Chen, Zhenming Liu, Christos Kozyrakis, Ion Stoica, and Xin Jin. 2020. RackSched: A microsecond-scale scheduler for rack-scale computers. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation (OSDI\u201920). USENIX Association, USA, Article 69, 16 pages."},{"key":"e_1_3_2_117_2","unstructured":"Moshe Zukerma. 2023. Introduction to queueing theory and stochastic teletraffic models."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3766543","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,6]],"date-time":"2025-10-06T13:59:18Z","timestamp":1759759158000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3766543"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,6]]},"references-count":116,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2026,3,31]]}},"alternative-id":["10.1145\/3766543"],"URL":"https:\/\/doi.org\/10.1145\/3766543","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"type":"print","value":"0360-0300"},{"type":"electronic","value":"1557-7341"}],"subject":[],"published":{"date-parts":[[2025,10,6]]},"assertion":[{"value":"2024-07-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-15","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}