{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T16:34:13Z","timestamp":1778603653312,"version":"3.51.4"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T00:00:00Z","timestamp":1737676800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T00:00:00Z","timestamp":1737676800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Spanish Science and Technology Commission","award":["PID2022-136454NB-C21"],"award-info":[{"award-number":["PID2022-136454NB-C21"]}]},{"name":"Spanish Science and Technology Commission","award":["PID2022-136454NB-C21"],"award-info":[{"award-number":["PID2022-136454NB-C21"]}]},{"name":"Ministerio de Ciencia e Innovaci\u00f3n; Proyectos de Transici\u00f3n Ecol\u00f3gica y Digital 2021","award":["TED2021-131176B-I00"],"award-info":[{"award-number":["TED2021-131176B-I00"]}]},{"name":"Ministerio de Ciencia e Innovaci\u00f3n; Proyectos de Transici\u00f3n Ecol\u00f3gica y Digital 2021","award":["TED2021-131176B-I00"],"award-info":[{"award-number":["TED2021-131176B-I00"]}]},{"DOI":"10.13039\/501100006365","name":"Universidad de Cantabria","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006365","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In recent years, energy consumption has become a limiting factor in the evolution of high-performance computing (HPC) clusters in terms of environmental concern and maintenance cost. The computing power of these clusters is increasing, together with the demands of the workloads they execute. A key component in HPC systems is the workload manager, whose operation has a substantial impact on the performance and energy consumption of the clusters. Recent research has employed machine learning techniques to optimise the operation of this component. However, these attempts have focused on homogeneous clusters where all the cores are pooled together and considered equal, disregarding the fact that they are contained in nodes and that they can have different performances. This work presents an intelligent job scheduler based on deep reinforcement learning that focuses on reducing energy consumption of heterogeneous HPC clusters. To this aim it leverages information provided by the users as well as the power consumption specifications of the compute resources of the cluster. The scheduler is evaluated against a set of heuristic algorithms showing that it has potential to give similar results, even in the face of the extra complexity of the heterogeneous cluster.<\/jats:p>","DOI":"10.1007\/s11227-024-06907-y","type":"journal-article","created":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T03:37:06Z","timestamp":1737689826000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Intelligent energy pairing scheduler (InEPS) for heterogeneous HPC clusters"],"prefix":"10.1007","volume":"81","author":[{"given":"Marta","family":"L\u00f3pez","sequence":"first","affiliation":[]},{"given":"Esteban","family":"Stafford","sequence":"additional","affiliation":[]},{"given":"Jose Luis","family":"Bosque","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,24]]},"reference":[{"issue":"3","key":"6907_CR1","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1007\/s11227-011-0593-5","volume":"58","author":"JL Bosque","year":"2011","unstructured":"Bosque JL, Robles OD, Toharia P, Pastor L (2011) Evaluating scalability in heterogeneous systems. J Supercomput 58(3):367\u2013375","journal-title":"J Supercomput"},{"issue":"6","key":"6907_CR2","doi-asserted-by":"publisher","first-page":"4078","DOI":"10.1016\/j.rser.2012.03.014","volume":"16","author":"M Uddin","year":"2012","unstructured":"Uddin M, Rahman AA (2012) Energy efficiency and low carbon enabler green it framework for data centers considering green metrics. Renew Sustain Energy Rev 16(6):4078\u20134094","journal-title":"Renew Sustain Energy Rev"},{"issue":"1","key":"6907_CR3","doi-asserted-by":"publisher","first-page":"208","DOI":"10.1016\/j.future.2012.06.003","volume":"29","author":"M Witkowski","year":"2013","unstructured":"Witkowski M, Oleksiak A, Piontek T, Weglarz J (2013) Practical power consumption estimation for real life hpc applications. Futur Gener Comput Syst 29(1):208\u2013217","journal-title":"Futur Gener Comput Syst"},{"key":"6907_CR4","unstructured":"Agency IE (2022) Data Centres and Data Transmission Networks"},{"issue":"3","key":"6907_CR5","doi-asserted-by":"publisher","first-page":"384","DOI":"10.1016\/S0022-0000(75)80008-0","volume":"10","author":"JD Ullman","year":"1975","unstructured":"Ullman JD (1975) Np-complete scheduling problems. J Comput Syst Sci 10(3):384\u2013393","journal-title":"J Comput Syst Sci"},{"issue":"6","key":"6907_CR6","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1109\/71.932708","volume":"12","author":"AW Mu\u2019alem","year":"2001","unstructured":"Mu\u2019alem AW, Feitelson DG (2001) Utilization, predictability, workloads, and user runtime estimates in scheduling the ibm sp2 with backfilling. IEEE Trans Parallel Distrib Syst 12(6):529\u2013543","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"6907_CR7","volume-title":"Scheduling Algorithms","author":"P Brucker","year":"2013","unstructured":"Brucker P (2013) Scheduling Algorithms. Springer, Berlin, Heidelberg"},{"key":"6907_CR8","doi-asserted-by":"crossref","unstructured":"Sun H, Elghazi R, Gainaru A, Aupy G, Raghavan P (2018) Scheduling parallel tasks under multiple resources: list scheduling vs. pack scheduling. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 194\u2013203","DOI":"10.1109\/IPDPS.2018.00029"},{"key":"6907_CR9","doi-asserted-by":"crossref","unstructured":"Mao H, Alizadeh M, Menache I, Kandula S (2016) Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM Workshop on Hot Topics in Networks. HotNets \u201916, Association for Computing Machinery, New York, NY, USA, pp 50\u201356","DOI":"10.1145\/3005745.3005750"},{"key":"6907_CR10","unstructured":"Ye Y, Ren X, Wang J, Xu L, Guo W, Huang W, Tian W (2018) A New Approach for Resource Scheduling with Deep Reinforcement Learning. arXiv: 1806.08122"},{"key":"6907_CR11","volume-title":"Pattern Recognition and Machine Learning","author":"CM Bishop","year":"2006","unstructured":"Bishop CM (2006) Pattern Recognition and Machine Learning. Springer, New York"},{"key":"6907_CR12","doi-asserted-by":"crossref","unstructured":"Zhang D, Dai D, He Y, Bao FS, Xie B (2020) RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning. arXiv: 1910.08925","DOI":"10.1109\/SC41405.2020.00035"},{"issue":"20","key":"6907_CR13","doi-asserted-by":"publisher","first-page":"9448","DOI":"10.3390\/app11209448","volume":"11","author":"Q Wang","year":"2021","unstructured":"Wang Q, Zhang H, Qu C, Shen Y, Liu X, Li J (2021) Rlschert: An hpc job scheduler using deep reinforcement learning and remaining time prediction. Appl Sci 11(20):9448","journal-title":"Appl Sci"},{"issue":"12","key":"6907_CR14","doi-asserted-by":"publisher","first-page":"4903","DOI":"10.1109\/TPDS.2022.3205325","volume":"33","author":"Y Fan","year":"2022","unstructured":"Fan Y, Li B, Favorite D, Singh N, Childers T, Rich P, Allcock W, Papka ME, Lan Z (2022) Dras: deep reinforcement learning for cluster scheduling in high performance computing. IEEE Trans Parallel Distrib Syst 33(12):4903\u20134917","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"12","key":"6907_CR15","doi-asserted-by":"publisher","first-page":"2624","DOI":"10.1109\/TPDS.2019.2922606","volume":"30","author":"S Maroulis","year":"2019","unstructured":"Maroulis S, Zacheilas N, Kalogeraki V (2019) A holistic energy-efficient real-time scheduler for mixed stream and batch processing workloads. IEEE Trans Parallel Distrib Syst 30(12):2624\u20132635","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"6907_CR16","doi-asserted-by":"publisher","DOI":"10.1016\/j.compeleceng.2021.107630","volume":"97","author":"JC Salinas-Hilburg","year":"2022","unstructured":"Salinas-Hilburg JC, Zapater M, Moya JM, Ayala JL (2022) Energy-aware task scheduling in data centers using an application signature. Comput Electr Eng 97:107630","journal-title":"Comput Electr Eng"},{"issue":"10","key":"6907_CR17","doi-asserted-by":"publisher","first-page":"13738","DOI":"10.1007\/s11227-024-05988-z","volume":"80","author":"E Stafford","year":"2024","unstructured":"Stafford E, Bosque JL (2024) Enhancing heterogeneous cluster efficiency through node-centric scheduling. J Supercomput 80(10):13738\u201313753","journal-title":"J Supercomput"},{"key":"6907_CR18","doi-asserted-by":"crossref","unstructured":"Herrera A, Ib\u00e1\u00f1ez M, Stafford E, Bosque JL (2021) A simulator for intelligent workload managers in heterogeneous clusters. In: IEEE\/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp 196\u2013205","DOI":"10.1109\/CCGrid51090.2021.00029"},{"key":"6907_CR19","volume-title":"Reinforcement Learning: An Introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. A Bradford Book, Cambridge"},{"key":"6907_CR20","doi-asserted-by":"crossref","unstructured":"Fomperosa J, Iba\u00f1ez M, Stafford E, Bosque JL (2023) Task scheduler for\u00a0heterogeneous data centres based on\u00a0deep reinforcement learning. In: Parallel Processing and Applied Mathematics. Springer, Berlin, Heidelberg, pp 237\u2013248","DOI":"10.1007\/978-3-031-30442-2_18"},{"key":"6907_CR21","doi-asserted-by":"crossref","unstructured":"Castillo E, Alvarez L, Moret\u00f3 M, Casas M, Vallejo E, Bosque JL, Beivide R, Valero M (2018) Architectural support for task dependence management with flexible software scheduling. In: IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, February 24\u201328, pp 283\u2013295","DOI":"10.1109\/HPCA.2018.00033"},{"key":"6907_CR22","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal Policy Optimization Algorithms. arXiv: 1707:06347"},{"key":"6907_CR23","unstructured":"Kingma DP, Ba J (2017) Adam: A Method for Stochastic Optimization. arXiv: 1412.6980"},{"key":"6907_CR24","unstructured":"Izmailov P, Podoprikhin D, Garipov T, Vetrov D, Wilson AG (2019) Averaging Weights Leads to Wider Optima and Better Generalization. arXiv: 1803.05407"},{"key":"6907_CR25","unstructured":"Bick D (2021) Towards Delivering a Coherent Self-Contained Explanation of Proximal Policy Optimization. Master\u2019s thesis, Rijksuniversiteit Groningen, Netherlands"},{"key":"6907_CR26","unstructured":"Fan Y (2021) Job Scheduling in High Performance Computing. arXiv: 2109.09269"},{"issue":"11","key":"6907_CR27","doi-asserted-by":"publisher","first-page":"8787","DOI":"10.1007\/s11227-020-03175-4","volume":"76","author":"E Stafford","year":"2020","unstructured":"Stafford E, Bosque JL (2020) Improving utilization of heterogeneous clusters. J Supercomput 76(11):8787\u20138800","journal-title":"J Supercomput"},{"issue":"12","key":"6907_CR28","doi-asserted-by":"publisher","first-page":"2624","DOI":"10.1109\/TPDS.2019.2922606","volume":"30","author":"S Maroulis","year":"2019","unstructured":"Maroulis S, Zacheilas N, Kalogeraki V (2019) A holistic energy-efficient real-time scheduler for mixed stream and batch processing workloads. IEEE Trans Parallel Distrib Syst 30(12):2624\u20132635","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"3","key":"6907_CR29","doi-asserted-by":"publisher","first-page":"1104","DOI":"10.1007\/s11227-013-0881-3","volume":"65","author":"JL Bosque","year":"2013","unstructured":"Bosque JL, Toharia P, Robles OD, Pastor L (2013) A load index and load balancing algorithm for heterogeneous clusters. J Supercomput 65(3):1104\u20131113","journal-title":"J Supercomput"},{"key":"6907_CR30","doi-asserted-by":"crossref","unstructured":"Tang W, Lan Z, Desai N, Buettner D (2009) Fault-aware, utility-based job scheduling on blue, gene\/p systems. In: 2009 IEEE International Conference on Cluster Computing and Workshops, pp 1\u201310","DOI":"10.1109\/CLUSTR.2009.5289206"},{"issue":"6","key":"6907_CR31","doi-asserted-by":"publisher","first-page":"789","DOI":"10.1109\/TPDS.2007.70606","volume":"18","author":"D Tsafrir","year":"2007","unstructured":"Tsafrir D, Etsion Y, Feitelson DG (2007) Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans Parallel Distrib Syst 18(6):789\u2013803","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"6907_CR32","doi-asserted-by":"crossref","unstructured":"Mao H, Schwarzkopf M, Venkatakrishnan SB, Meng Z, Alizadeh M (2019) Learning scheduling algorithms for data processing clusters. ACM Special Interest Group on Data Communication. SIGCOMM \u201919. Association for Computing Machinery, New York, pp 270\u2013288","DOI":"10.1145\/3341302.3342080"},{"key":"6907_CR33","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2021.102329","volume":"122","author":"J Peng","year":"2022","unstructured":"Peng J, Li K, Chen J, Li K (2022) Hea-pas: A hybrid energy allocation strategy for parallel applications scheduling on heterogeneous computing systems. J Syst Architect 122:102329","journal-title":"J Syst Architect"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-024-06907-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-024-06907-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-024-06907-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T03:37:22Z","timestamp":1737689842000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-024-06907-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,24]]},"references-count":33,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,1]]}},"alternative-id":["6907"],"URL":"https:\/\/doi.org\/10.1007\/s11227-024-06907-y","relation":{},"ISSN":["1573-0484"],"issn-type":[{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,24]]},"assertion":[{"value":"27 December 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 January 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}}],"article-number":"427"}}