{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T14:39:34Z","timestamp":1774967974859,"version":"3.50.1"},"reference-count":54,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T00:00:00Z","timestamp":1759881600000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"DOI":"10.13039\/100006192","name":"Advanced Scientific Computing Research","doi-asserted-by":"publisher","award":["17-SC-20-SC"],"award-info":[{"award-number":["17-SC-20-SC"]}],"id":[{"id":"10.13039\/100006192","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000083","name":"Directorate for Computer and Information Science and Engineering","doi-asserted-by":"publisher","award":["1550588"],"award-info":[{"award-number":["1550588"]}],"id":[{"id":"10.13039\/100000083","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000083","name":"Directorate for Computer and Information Science and Engineering","doi-asserted-by":"publisher","award":["2004894"],"award-info":[{"award-number":["2004894"]}],"id":[{"id":"10.13039\/100000083","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006224","name":"Argonne National Laboratory","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006224","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:p> Computational workflows are a common class of application on supercomputers, yet the loosely coupled and heterogeneous nature of workflows often fails to take full advantage of their capabilities. We created Colmena to leverage the massive parallelism of a supercomputer by using Artificial Intelligence (AI) to learn from and adapt a workflow as it executes. Colmena allows scientists to define how their application should respond to events (e.g., task completion) as a series of cooperative agents. In this paper, we describe the design of Colmena, the challenges we overcame while deploying applications on exascale systems, and the science workflows we have enhanced through interweaving AI. The scaling challenges we discuss include developing steering strategies that maximize node utilization, introducing data fabrics that reduce communication overhead of data-intensive tasks, and implementing workflow tasks that cache costly operations between invocations. These innovations coupled with a variety of application patterns accessible through our agent-based steering model have enabled science advances in chemistry, biophysics, and materials science using different types of AI. Our vision is that Colmena will spur creative solutions that harness AI across many domains of scientific computing. <\/jats:p>","DOI":"10.1177\/10943420241288242","type":"journal-article","created":{"date-parts":[[2024,10,10]],"date-time":"2024-10-10T01:15:40Z","timestamp":1728522940000},"page":"52-64","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":5,"title":["Employing artificial intelligence to steer exascale workflows with colmena"],"prefix":"10.1177","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1323-5939","authenticated-orcid":false,"given":"Logan","family":"Ward","sequence":"first","affiliation":[{"name":"Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6547-6902","authenticated-orcid":false,"given":"J. Gregory","family":"Pauloski","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Chicago, Chicago, IL, USA"}]},{"given":"Valerie","family":"Hayot-Sasson","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Chicago, Chicago, IL, USA"}]},{"given":"Yadu","family":"Babuji","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Chicago, Chicago, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9873-9177","authenticated-orcid":false,"given":"Alexander","family":"Brace","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Chicago, Chicago, IL, USA"}]},{"given":"Ryan","family":"Chard","sequence":"additional","affiliation":[{"name":"Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA"}]},{"given":"Kyle","family":"Chard","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Chicago, Chicago, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5532-3048","authenticated-orcid":false,"given":"Rajeev","family":"Thakur","sequence":"additional","affiliation":[{"name":"Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2129-5269","authenticated-orcid":false,"given":"Ian","family":"Foster","sequence":"additional","affiliation":[{"name":"Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA"}]}],"member":"179","published-online":{"date-parts":[[2024,10,8]]},"reference":[{"key":"bibr1-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1098\/rsta.2019.0056"},{"key":"bibr2-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1177\/10943420211029302"},{"key":"bibr3-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/works56498.2022.00009"},{"key":"bibr4-10943420241288242","doi-asserted-by":"crossref","unstructured":"Babuji Y, Woodard A, Li Z, et al. (2019) Parsl: pervasive parallel programming in Python. In: ACM International Symposium on High-Performance Parallel and Distributed Computing, Pisa, Italy, June 3-7.","DOI":"10.1145\/3307681.3325400"},{"key":"bibr5-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1016\/b978-0-323-88457-0.00003-5"},{"key":"bibr6-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2018.00014"},{"key":"bibr7-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-018-2507-5"},{"key":"bibr8-10943420241288242","unstructured":"Brace A, Pauloski JG (2023) https:\/\/github.com\/braceal\/parsl_object_registry."},{"key":"bibr9-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1145\/3468267.3470578"},{"key":"bibr10-10943420241288242","first-page":"806","volume-title":"IEEE International Parallel and Distributed Processing Symposium","author":"Brace A","year":"2022"},{"key":"bibr11-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1002\/qua.24952"},{"key":"bibr12-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1177\/10943420211006452"},{"key":"bibr13-10943420241288242","doi-asserted-by":"crossref","unstructured":"Chard R, Babuji Y, Li Z, et al. (2020) funcX: a federated function serving fabric for science. In: 29th Intl Symp on High-Performance Parallel Dist Computing, Stockholm Sweden, June 23 - 26, 2020.","DOI":"10.1145\/3369583.3392683"},{"key":"bibr14-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1103\/physrevlett.91.135503"},{"key":"bibr15-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1145\/3624062.3624238"},{"key":"bibr16-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1145\/3624062.3626087"},{"key":"bibr17-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1021\/acs.chemmater.0c00768"},{"key":"bibr18-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1177\/10943420221128233"},{"key":"bibr19-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-biophys-042910-155245"},{"key":"bibr20-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1088\/2515-7639\/ab0c3d"},{"key":"bibr21-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR56361.2022.9956231"},{"key":"bibr22-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/ipdpsw.2019.00081"},{"key":"bibr23-10943420241288242","unstructured":"Garden Project (2023) https:\/\/thegardens.ai\/."},{"key":"bibr24-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1021\/acscentsci.7b00572"},{"key":"bibr25-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1039\/d3dd00123g"},{"key":"bibr26-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/ccgrid.2010.116"},{"key":"bibr27-10943420241288242","first-page":"37","volume":"8","author":"Hospital A","year":"2015","journal-title":"Advances and Applications in Bioinformatics and Chemistry"},{"key":"bibr28-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/tpds.2021.3082815"},{"key":"bibr30-10943420241288242","unstructured":"Hugging Face (2023) https:\/\/huggingface.co\/."},{"key":"bibr31-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1103\/physrevlett.120.026102"},{"key":"bibr32-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/e-science58273.2023.10254910"},{"key":"bibr33-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/DLS49591.2019.00007"},{"key":"bibr34-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1126\/science.ade2574"},{"key":"bibr35-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1007\/s10723-015-9329-8"},{"key":"bibr36-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1039\/d0sc01101k"},{"key":"bibr37-10943420241288242","first-page":"561","volume-title":"13th USENIX OSDI","author":"Moritz P","year":"2018"},{"key":"bibr29-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1038\/s42004-023-01090-2"},{"key":"bibr38-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2022.101707"},{"key":"bibr39-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607047"},{"key":"bibr40-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2002.10019"},{"key":"bibr41-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/works54523.2021.00008"},{"key":"bibr42-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1007\/s11837-022-05368-z"},{"key":"bibr43-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jctc.1c01154"},{"key":"bibr44-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jpcb.8b06521"},{"key":"bibr45-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1063\/1.5099132"},{"key":"bibr46-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1177\/10943420221113513"},{"key":"bibr47-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/mlhpc54614.2021.00007"},{"key":"bibr48-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/ipdpsw59300.2023.00018"},{"key":"bibr49-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2023.100875"},{"key":"bibr50-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-018-2508-4"},{"key":"bibr51-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1109\/WORKS54523.2021.00009"},{"key":"bibr52-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1145\/3447818.3460370"},{"key":"bibr53-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2020.107206"},{"key":"bibr54-10943420241288242","doi-asserted-by":"publisher","DOI":"10.1177\/10943420231201154"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241288242","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420241288242","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241288242","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241288242","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T01:14:46Z","timestamp":1740791686000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420241288242"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,8]]},"references-count":54,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["10.1177\/10943420241288242"],"URL":"https:\/\/doi.org\/10.1177\/10943420241288242","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,8]]}}}