{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:46:05Z","timestamp":1775231165264,"version":"3.50.1"},"reference-count":38,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2023,4,3]],"date-time":"2023-04-03T00:00:00Z","timestamp":1680480000000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"DOI":"10.13039\/100000015","name":"Department of Energy","doi-asserted-by":"crossref","award":["DE-AC52-07NA27344 (LLNL-JRNL-817993-DRAFT)"],"award-info":[{"award-number":["DE-AC52-07NA27344 (LLNL-JRNL-817993-DRAFT)"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["#1841758"],"award-info":[{"award-number":["#1841758"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["#1900888"],"award-info":[{"award-number":["#1900888"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2022,5]]},"abstract":"<jats:p> Traditional workload managers do not have the capacity to consider how IO contention can increase job runtime and even cause entire resource allocations to be wasted. Whether from bursts of IO demand or parallel file systems (PFS) performance degradation, IO contention must be identified and addressed to ensure maximum performance. In this paper, we present AI4IO (AI for IO), a suite of tools using AI methods to prevent and mitigate performance losses due to IO contention. AI4IO enables existing workload managers to become IO-aware. Currently, AI4IO consists of two tools: PRIONN and CanarIO. PRIONN predicts IO contention and empowers schedulers to prevent it. CanarIO mitigates the impact of IO contention when it does occur. We measure the effectiveness of AI4IO when integrated into Flux, a next-generation scheduler, for both small- and large-scale IO-intensive job workloads. Our results show that integrating AI4IO into Flux improves the workload makespan up to 6.4%, which can account for more than 18,000 node-h of saved resources per week on a production cluster in our large-scale workload. <\/jats:p>","DOI":"10.1177\/10943420221079765","type":"journal-article","created":{"date-parts":[[2022,4,4]],"date-time":"2022-04-04T03:39:30Z","timestamp":1649043570000},"page":"370-387","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":6,"title":["AI4IO: A suite of AI-based tools for IO-aware scheduling"],"prefix":"10.1177","volume":"36","author":[{"suffix":"II","given":"Michael R","family":"Wyatt","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Tennessee Knoxville, Knoxville, TN, USA"},{"name":"Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0141-0653","authenticated-orcid":false,"given":"Stephen","family":"Herbein","sequence":"additional","affiliation":[{"name":"Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA"}]},{"given":"Todd","family":"Gamblin","sequence":"additional","affiliation":[{"name":"Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0031-6377","authenticated-orcid":false,"given":"Michela","family":"Taufer","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Tennessee Knoxville, Knoxville, TN, USA"}]}],"member":"179","published-online":{"date-parts":[[2022,4,3]]},"reference":[{"key":"bibr1-10943420221079765","unstructured":"Quartz (2020) Quartz supercomputer: characteristics. Available at: https:\/\/hpc.llnl.gov\/hardware\/platforms\/quartz (accessed 23 December 2020)."},{"key":"bibr2-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2020.04.006"},{"key":"bibr3-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/385\/1\/012010"},{"key":"bibr4-10943420221079765","unstructured":"Blagodurov S, Fedorova A (2013) Optimizing shared resource contention in HPC clusters. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), Denver, CO, 22 November 2013, pp. 1\u201325."},{"key":"bibr5-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.46"},{"key":"bibr6-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/HUSTProtools51951.2020.00013"},{"key":"bibr7-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/DSNW.2013.6615513"},{"key":"bibr8-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2016.08.010"},{"key":"bibr9-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.62"},{"key":"bibr10-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/HPCS.2018.00062"},{"key":"bibr11-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.27"},{"key":"bibr12-10943420221079765","doi-asserted-by":"publisher","DOI":"10.21236\/ADA603824"},{"key":"bibr13-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1145\/2907294.2907316"},{"key":"bibr14-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2019.8891051"},{"key":"bibr15-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/MDSO.2004.1301253"},{"key":"bibr16-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00077"},{"key":"bibr17-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.32"},{"key":"bibr18-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23400-2_10"},{"key":"bibr19-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2010.98"},{"key":"bibr20-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2016.58"},{"key":"bibr21-10943420221079765","unstructured":"Mikolov T, Sutskever I, Chen K, et al. (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th Neural Information Processing Systems Conference, Lake Tahoe, Nevada, 5\u201310 December 2013, pp. 3111\u20133119."},{"key":"bibr22-10943420221079765","doi-asserted-by":"publisher","DOI":"10.2172\/1184188"},{"key":"bibr23-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2019.10.007"},{"key":"bibr24-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2017.25"},{"key":"bibr25-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2543725"},{"key":"bibr26-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2017.00084"},{"key":"bibr27-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2016.10"},{"key":"bibr28-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1145\/1254882.1254939"},{"key":"bibr29-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/HUST.2016.006"},{"key":"bibr30-10943420221079765","doi-asserted-by":"crossref","unstructured":"Smith W, Foster I, Taylor V (1998) Predicting application runtimes using historical information. In: Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, Orlando, FL, 30 March 1998, pp. 122\u2013142.","DOI":"10.1007\/BFb0053984"},{"key":"bibr31-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-47954-6_11"},{"key":"bibr32-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1145\/3322798.3329258"},{"key":"bibr33-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2007.70606"},{"key":"bibr34-10943420221079765","volume-title":"Unstructured Data Analytics for Next-Generation HPC Schedulers: Capturing Jobs\u2019 Needs from Unstructured Job Scripts","author":"Wyatt MR","year":"2017"},{"key":"bibr35-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1145\/3225058.3225091"},{"key":"bibr36-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS47924.2020.00018"},{"key":"bibr37-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2012.6232370"},{"key":"bibr38-10943420221079765","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2019.00-16"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221079765","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420221079765","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221079765","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221079765","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,28]],"date-time":"2025-01-28T00:20:21Z","timestamp":1738023621000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420221079765"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,3]]},"references-count":38,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,5]]}},"alternative-id":["10.1177\/10943420221079765"],"URL":"https:\/\/doi.org\/10.1177\/10943420221079765","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,3]]}}}