{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T16:29:12Z","timestamp":1781022552535,"version":"3.54.1"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,12,11]],"date-time":"2024-12-11T00:00:00Z","timestamp":1733875200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Model. Perform. Eval. Comput. Syst."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>A correct evaluation of scheduling algorithms and a good understanding of their optimization criteria are key components of resource management in HPC. In this work, we discuss bias and limitations of the most frequent optimization metrics from the literature. We provide elements on how to evaluate performance when studying HPC batch scheduling.<\/jats:p>\n          <jats:p>We experimentally demonstrate these limitations by focusing on two use-cases: a study on the impact of runtime estimates on scheduling performance, and the reproduction of a recent high-impact work that designed an HPC batch scheduler based on a network trained with reinforcement learning. We demonstrate that focusing on quantitative optimization criterion (\u201cour work improves the literature by X%\u201d) may hide extremely important caveat, to the point that the results obtained are opposed to the actual goals of the authors.<\/jats:p>\n          <jats:p>Key findings show that mean bounded slowdown and mean response time are hazardous for a purely quantitative analysis in the context of HPC. Despite some limitations, utilization appears to be a good objective. We propose to complement it with the standard deviation of the throughput in some pathological cases. Finally, we argue for a larger use of area-weighted response time, that we find to be a very relevant objective.<\/jats:p>","DOI":"10.1145\/3701986","type":"journal-article","created":{"date-parts":[[2024,10,29]],"date-time":"2024-10-29T10:11:44Z","timestamp":1730196704000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Qualitatively Analyzing Optimization Objectives in the Design of HPC Resource Manager"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-5016-4413","authenticated-orcid":false,"given":"Robin","family":"Bo\u00ebzennec","sequence":"first","affiliation":[{"name":"INRIA, Rennes, France, Universit\u00e9 of Rennes, Rennes, France, CNRS, Rennes, France and IRISA, IRISA, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2260-2200","authenticated-orcid":false,"given":"Fanny","family":"Dufoss\u00e9","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Grenoble Alpes, Grenoble, France, INRIA, Grenoble, France, CNRS, Grenoble, France and Grenoble INP UGA, Grenoble, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8862-3277","authenticated-orcid":false,"given":"Guillaume","family":"Pallez","sequence":"additional","affiliation":[{"name":"INRIA, Rennes, France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,12,11]]},"reference":[{"key":"e_1_3_4_2_2","first-page":"253","volume-title":"Workshop on Job Scheduling Strategies for Parallel Processing","author":"Lee Cynthia Bailey","year":"2004","unstructured":"Cynthia Bailey Lee, Yael Schwartzman, Jennifer Hardy, and Allan Snavely. 2004. Are user runtime estimates inherently inaccurate?. In Workshop on Job Scheduling Strategies for Parallel Processing. Springer, 253\u2013263."},{"key":"e_1_3_4_3_2","first-page":"1","volume-title":"JSSPP 2023-26th Edition of the Workshop on Job Scheduling Strategies for Parallel Processing","author":"Bo\u00ebzennec Robin","year":"2023","unstructured":"Robin Bo\u00ebzennec, Fanny Dufoss\u00e9, and Guillaume Pallez. 2023. Optimization metrics for the evaluation of batch schedulers in HPC. In JSSPP 2023-26th Edition of the Workshop on Job Scheduling Strategies for Parallel Processing. 1\u201319."},{"key":"e_1_3_4_4_2","article-title":"IO-SETS: Simple and efficient approaches for I\/O bandwidth management","author":"Boito Francieli","year":"2023","unstructured":"Francieli Boito, Guillaume Pallez, Luan Teylo, and Nicolas Vidal. 2023. IO-SETS: Simple and efficient approaches for I\/O bandwidth management. IEEE Transactions on Parallel and Distributed Systems 34, 10 (2023), 2783\u20132796.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_4_5_2","first-page":"1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Carastan-Santos Danilo","year":"2017","unstructured":"Danilo Carastan-Santos and Raphael Y. De Camargo. 2017. Obtaining dynamic scheduling policies with simulation and machine learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1\u201313."},{"key":"e_1_3_4_6_2","first-page":"1","volume-title":"2019 19th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID\u201919)","author":"Carastan-Santos Danilo","year":"2019","unstructured":"Danilo Carastan-Santos, Raphael Y. De Camargo, Denis Trystram, and Salah Zrigui. 2019. One can only gain by replacing EASY Backfilling: A simple scheduling policies case study. In 2019 19th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID\u201919). IEEE, 1\u201310."},{"key":"e_1_3_4_7_2","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1007\/3-540-36180-4_7","volume-title":"Job Scheduling Strategies for Parallel Processing: 8th International Workshop, JSSPP 2002 Edinburgh, Scotland, UK, July 24, 2002 Revised Papers 8","author":"Chiang Su-Hui","year":"2002","unstructured":"Su-Hui Chiang, Andrea Arpaci-Dusseau, and Mary K. Vernon. 2002. The impact of more accurate requested runtimes on production job scheduling performance. In Job Scheduling Strategies for Parallel Processing: 8th International Workshop, JSSPP 2002 Edinburgh, Scotland, UK, July 24, 2002 Revised Papers 8. Springer, 103\u2013127."},{"key":"e_1_3_4_8_2","doi-asserted-by":"crossref","unstructured":"Yishu Du Loris Marchal Guillaume Pallez and Yves Robert. 2024. Improving batch schedulers with node stealing for failed jobs. Concurrency and Computation: Practice and Experience 36 12 (2024) e8043.","DOI":"10.1002\/cpe.8043"},{"key":"e_1_3_4_9_2","volume-title":"20th Workshop on Job Scheduling Strategies for Parallel Processing","author":"Dutot Pierre-Fran\u00e7ois","year":"2016","unstructured":"Pierre-Fran\u00e7ois Dutot, Michael Mercier, Millian Poquet, and Olivier Richard. 2016. Batsim: A realistic language-independent resources and jobs management systems simulator. In 20th Workshop on Job Scheduling Strategies for Parallel Processing. Chicago, United States. Retrieved from https:\/\/hal.archives-ouvertes.fr\/hal-01333471"},{"key":"e_1_3_4_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3205325"},{"key":"e_1_3_4_11_2","first-page":"530","volume-title":"2017 IEEE International Conference on Cluster Computing (CLUSTER\u201917)","author":"Fan Yuping","year":"2017","unstructured":"Yuping Fan, Paul Rich, William E. Allcock, Michael E. Papka, and Zhiling Lan. 2017. Trade-off between prediction accuracy and underestimation rate in job runtime estimates. In 2017 IEEE International Conference on Cluster Computing (CLUSTER\u201917). IEEE, 530\u2013540."},{"key":"e_1_3_4_12_2","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1007\/3-540-45540-X_11","volume-title":"Workshop on Job Scheduling Strategies for Parallel Processing","author":"Feitelson Dror G.","year":"2001","unstructured":"Dror G. Feitelson. 2001. Metrics for parallel job scheduling and their convergence. In Workshop on Job Scheduling Strategies for Parallel Processing. Springer, 188\u2013205."},{"issue":"09","key":"e_1_3_4_13_2","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/MC.2003.1231190","article-title":"Metric and workload effects on computer systems evaluation","volume":"36","author":"Feitelson Dror G.","year":"2003","unstructured":"Dror G. Feitelson. 2003. Metric and workload effects on computer systems evaluation. Computer 36, 09 (2003), 18\u201325.","journal-title":"Computer"},{"key":"e_1_3_4_14_2","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/11605300_13","volume-title":"Job Scheduling Strategies for Parallel Processing: 11th International Workshop, JSSPP 2005, Cambridge, MA, USA, June 19, 2005, Revised Selected Papers 11","author":"Frachtenberg Eitan","year":"2005","unstructured":"Eitan Frachtenberg and Dror G. Feitelson. 2005. Pitfalls in parallel job scheduling evaluation. In Job Scheduling Strategies for Parallel Processing: 11th International Workshop, JSSPP 2005, Cambridge, MA, USA, June 19, 2005, Revised Selected Papers 11. Springer, 257\u2013282."},{"key":"e_1_3_4_15_2","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1109\/ScalA49573.2019.00013","volume-title":"2019 IEEE\/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA\u201919)","author":"Gainaru Ana","year":"2019","unstructured":"Ana Gainaru and Guillaume Pallez. 2019. Making speculative scheduling robust to incomplete data. In 2019 IEEE\/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA\u201919). IEEE, 62\u201371."},{"key":"e_1_3_4_16_2","first-page":"1","volume-title":"Proceedings of the 48th International Conference on Parallel Processing","author":"Gainaru Ana","year":"2019","unstructured":"Ana Gainaru, Guillaume Pallez, Hongyang Sun, and Padma Raghavan. 2019. Speculative scheduling for stochastic HPC applications. In Proceedings of the 48th International Conference on Parallel Processing. 1\u201310."},{"issue":"10","key":"e_1_3_4_17_2","doi-asserted-by":"crossref","first-page":"2304","DOI":"10.1109\/TPDS.2018.2820699","article-title":"Online tuning of EASY-backfilling using queue reordering policies","volume":"29","author":"Gaussier Eric","year":"2018","unstructured":"Eric Gaussier, J\u00e9r\u00f4me Lelong, Valentin Reis, and Denis Trystram. 2018. Online tuning of EASY-backfilling using queue reordering policies. IEEE Transactions on Parallel and Distributed Systems 29, 10 (2018), 2304\u20132316.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD55451.2022.00035"},{"issue":"6","key":"e_1_3_4_19_2","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1109\/MCSE.2013.95","article-title":"Exascale computing trends: Adjusting to the \u201cnew normal\u201d for computer architecture","volume":"15","author":"Kogge Peter","year":"2013","unstructured":"Peter Kogge and John Shalf. 2013. Exascale computing trends: Adjusting to the \u201cnew normal\u201d for computer architecture. Computing in Science & Engineering 15, 6 (2013), 16\u201326.","journal-title":"Computing in Science & Engineering"},{"key":"e_1_3_4_20_2","first-page":"686","volume-title":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201919)","author":"Legrand Arnaud","year":"2019","unstructured":"Arnaud Legrand, Denis Trystram, and Salah Zrigui. 2019. Adapting batch scheduling to workload characteristics: What can we expect from online learning? In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS\u201919). IEEE, 686\u2013695."},{"key":"e_1_3_4_21_2","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1109\/ICPPW.2010.48","volume-title":"2010 39th International Conference on Parallel Processing Workshops","author":"Leung Vitus J.","year":"2010","unstructured":"Vitus J. Leung, Gerald Sabin, and Ponnuswamy Sadayappan. 2010. Parallel job scheduling policies to improve fairness: A case study. In 2010 39th International Conference on Parallel Processing Workshops. IEEE, 346\u2013353."},{"issue":"6","key":"e_1_3_4_22_2","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1109\/71.932708","article-title":"Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling","volume":"12","author":"Mu\u2019alem Ahuva W.","year":"2001","unstructured":"Ahuva W. Mu\u2019alem and Dror G. Feitelson. 2001. Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Transactions on Parallel and Distributed Systems 12, 6 (2001), 529\u2013543.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_4_23_2","doi-asserted-by":"crossref","DOI":"10.1017\/9781108264907","volume-title":"The Science of Qualitative Research","author":"Packer Martin J.","year":"2017","unstructured":"Martin J. Packer. 2017. The Science of Qualitative Research. Cambridge University Press."},{"key":"e_1_3_4_24_2","first-page":"1","volume-title":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Patel Tirthak","year":"2020","unstructured":"Tirthak Patel, Zhengchun Liu, Raj Kettimuthu, Paul Rich, William Allcock, and Devesh Tiwari. 2020. Job characteristics on large-scale systems: Long-term analysis, quantification, and implications. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1\u201317."},{"key":"e_1_3_4_25_2","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1109\/SC.2000.10041","volume-title":"SC\u201900: Proceedings of the 2000 ACM\/IEEE Conference on Supercomputing","author":"Perkovic Dejan","year":"2000","unstructured":"Dejan Perkovic and Peter J. Keleher. 2000. Randomization, speculation, and adaptation in batch schedulers. In SC\u201900: Proceedings of the 2000 ACM\/IEEE Conference on Supercomputing. IEEE, 7\u20137."},{"key":"e_1_3_4_26_2","doi-asserted-by":"crossref","first-page":"959","DOI":"10.1145\/3531146.3533158","volume-title":"2022 ACM Conference on Fairness, Accountability, and Transparency","author":"Raji Inioluwa Deborah","year":"2022","unstructured":"Inioluwa Deborah Raji, I. Elizabeth Kumar, Aaron Horowitz, and Andrew Selbst. 2022. The fallacy of AI functionality. In 2022 ACM Conference on Fairness, Accountability, and Transparency. 959\u2013972."},{"key":"e_1_3_4_27_2","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1007\/3-540-39997-6_3","volume-title":"Workshop on Job Scheduling Strategies for Parallel Processing","author":"Rudolph Larry","year":"2000","unstructured":"Larry Rudolph and Paul H. Smith. 2000. Valuation of ultra-scale computing systems. In Workshop on Job Scheduling Strategies for Parallel Processing. Springer, 39\u201355."},{"key":"e_1_3_4_28_2","first-page":"1","volume-title":"2009 IEEE International Conference on Cluster Computing and Workshops","author":"Tang Wei","year":"2009","unstructured":"Wei Tang, Zhiling Lan, Narayan Desai, and Daniel Buettner. 2009. Fault-aware, utility-based job scheduling on blue, gene\/p systems. In 2009 IEEE International Conference on Cluster Computing and Workshops. IEEE, 1\u201310."},{"issue":"6","key":"e_1_3_4_29_2","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1109\/TPDS.2007.70606","article-title":"Backfilling using system-generated predictions rather than user runtime estimates","volume":"18","author":"Tsafrir Dan","year":"2007","unstructured":"Dan Tsafrir, Yoav Etsion, and Dror G. Feitelson. 2007. Backfilling using system-generated predictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems 18, 6 (2007), 789\u2013803.","journal-title":"IEEE Transactions on Parallel and Distributed Systems"},{"key":"e_1_3_4_30_2","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1109\/CLUSTER.2014.6968735","volume-title":"2014 IEEE International Conference on Cluster Computing (CLUSTER\u201914)","author":"Verma Abhishek","year":"2014","unstructured":"Abhishek Verma, Madhukar Korupolu, and John Wilkes. 2014. Evaluating job packing in warehouse-scale computing. In 2014 IEEE International Conference on Cluster Computing (CLUSTER\u201914). IEEE, 48\u201356."},{"key":"e_1_3_4_31_2","first-page":"1","volume-title":"SC20: International Conference for High Performance Computing, Networking, Storage and Analysis","author":"Zhang Di","year":"2020","unstructured":"Di Zhang, Dong Dai, Youbiao He, Forrest Sheng Bao, and Bing Xie. 2020. RLScheduler: An automated HPC batch job scheduler using reinforcement learning. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1\u201315."},{"key":"e_1_3_4_32_2","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1145\/3502181.3531470","volume-title":"Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201922)","author":"Zhang Di","year":"2022","unstructured":"Di Zhang, Dong Dai, and Bing Xie. 2022. SchedInspector: A batch job scheduling inspector using reinforcement learning. In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing (HPDC\u201922). Association for Computing Machinery, New York, NY, 97\u2013109."},{"key":"e_1_3_4_33_2","first-page":"133","volume-title":"Proceedings 14th International Parallel and Distributed Processing Symposium (IPDPS\u201900)","author":"Zhang Yanyong","year":"2000","unstructured":"Yanyong Zhang, Hubertus Franke, Jos\u00e9 E. Moreira, and Anand Sivasubramaniam. 2000. Improving parallel job scheduling by combining gang scheduling and backfilling techniques. In Proceedings 14th International Parallel and Distributed Processing Symposium (IPDPS\u201900). IEEE, 133\u2013142."},{"key":"e_1_3_4_34_2","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1109\/HPDC.1999.805303","volume-title":"Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No. 99TH8469)","author":"Zotkin Dmitry","year":"1999","unstructured":"Dmitry Zotkin and Peter J. Keleher. 1999. Job-length estimation and performance in backfilling schedulers. In Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No. 99TH8469). IEEE, 236\u2013243."}],"container-title":["ACM Transactions on Modeling and Performance Evaluation of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701986","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3701986","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:16Z","timestamp":1750298236000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3701986"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,11]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3701986"],"URL":"https:\/\/doi.org\/10.1145\/3701986","relation":{},"ISSN":["2376-3639","2376-3647"],"issn-type":[{"value":"2376-3639","type":"print"},{"value":"2376-3647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,11]]},"assertion":[{"value":"2024-02-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-14","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}