{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,2]],"date-time":"2025-03-02T05:36:45Z","timestamp":1740893805899,"version":"3.38.0"},"reference-count":51,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T00:00:00Z","timestamp":1759795200000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"DOI":"10.13039\/100006192","name":"DOE Advanced Scientific Computing Research, ECP PROTEA-TUNE, SciDAC RAPIDS and OASIS","doi-asserted-by":"publisher","award":["DE-AC02-06CH11357"],"award-info":[{"award-number":["DE-AC02-06CH11357"]}],"id":[{"id":"10.13039\/100006192","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:p> Ytopt is a Python machine-learning-based autotuning software package developed within the ECP PROTEAS-TUNE project. The ytopt software adopts an asynchronous search framework that consists of sampling a small number of input parameter configurations and progressively fitting a surrogate model over the input-output space until exhausting the user-defined maximum number of evaluations or the wall-clock time. libEnsemble is a Python toolkit for coordinating workflows of asynchronous and dynamic ensembles of calculations across massively parallel resources developed within the ECP PETSc\/TAO project. libEnsemble helps users take advantage of massively parallel resources to solve design, decision, and inference problems and expands the class of problems that can benefit from increased parallelism. In this paper we present our methodology and framework to integrate ytopt and libEnsemble to take advantage of massively parallel resources to accelerate the autotuning process. Specifically, we focus on using the proposed framework to autotune the ECP ExaSMR application OpenMC, an open source Monte Carlo particle transport code. OpenMC has seven tunable parameters some of which have large ranges such as the number of particles in-flight, which is in the range of 100,000 to 8\u00a0million, with its default setting of 1\u00a0million. Setting the proper combination of these parameter values to achieve the best performance is extremely time-consuming. Therefore, we apply the proposed framework to autotune the MPI\/OpenMP offload version of OpenMC based on a user-defined metric such as the figure of merit (FoM) (particles\/s) or energy efficiency energy-delay product (EDP) on Crusher at Oak Ridge Leadership Computing Facility. The experimental results show that we achieve the improvement up to 29.49% in FoM and up to 30.44% in EDP. <\/jats:p>","DOI":"10.1177\/10943420241286476","type":"journal-article","created":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T10:27:38Z","timestamp":1728383258000},"page":"79-103","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":1,"title":["Integrating ytopt and libEnsemble to autotune OpenMC"],"prefix":"10.1177","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8150-5171","authenticated-orcid":false,"given":"Xingfu","family":"Wu","sequence":"first","affiliation":[{"name":"Mathematics & Computer Science Division,Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"John R","family":"Tramm","sequence":"additional","affiliation":[{"name":"Mathematics & Computer Science Division,Argonne National Laboratory, Lemont, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9924-2082","authenticated-orcid":false,"given":"Jeffrey","family":"Larson","sequence":"additional","affiliation":[{"name":"Mathematics & Computer Science Division,Argonne National Laboratory, Lemont, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9916-9038","authenticated-orcid":false,"given":"John-Luke","family":"Navarro","sequence":"additional","affiliation":[{"name":"Mathematics & Computer Science Division,Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"Prasanna","family":"Balaprakash","sequence":"additional","affiliation":[{"name":"Artificial Intelligence Programs, Oak Ridge National Laboratory, Oak Ridge, TN, USA"}]},{"given":"Brice","family":"Videau","sequence":"additional","affiliation":[{"name":"Mathematics & Computer Science Division,Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"Michael","family":"Kruse","sequence":"additional","affiliation":[{"name":"Mathematics & Computer Science Division,Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"Paul","family":"Hovland","sequence":"additional","affiliation":[{"name":"Mathematics & Computer Science Division,Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"Valerie","family":"Taylor","sequence":"additional","affiliation":[{"name":"Mathematics & Computer Science Division,Argonne National Laboratory, Lemont, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3058-7573","authenticated-orcid":false,"given":"Mary","family":"Hall","sequence":"additional","affiliation":[{"name":"School of Computing, University of Utah, Salt Lake City, UT, USA"}]}],"member":"179","published-online":{"date-parts":[[2024,10,7]]},"reference":[{"key":"bibr1-10943420241286476","unstructured":"APEX (2023) APEX: autonomic performance environment for eXascale. https:\/\/uo-oaciss.github.io\/apex\/."},{"key":"bibr2-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1145\/3197978"},{"key":"bibr3-10943420241286476","doi-asserted-by":"crossref","unstructured":"Balaprakash P, B Gramacy R, Wild SM (2013) Active-learning-based surrogate models for empirical performance running. In: Proceedings of the 2013 IEEE international conference on cluster computing (CLUSTER\u201913), Indianapolis, Indiana, 23\u201327 September 2013.","DOI":"10.1109\/CLUSTER.2013.6702683"},{"key":"bibr4-10943420241286476","unstructured":"Chang TH, Larson J, Watson LT, et al. (2020) Managing computationally expensive blackbox multiobjective optimization problems with libEnsemble. In: Proceedings of the spring simulation conference, Fairfax, VA, 18\u201321 May 2020."},{"key":"bibr5-10943420241286476","doi-asserted-by":"crossref","unstructured":"Chen C, Chame J, Hall M (2005) Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: Proceedings of international symposium on code generation and optimization, San Jose, CA, 20\u201323 March 2005.","DOI":"10.1109\/CGO.2005.10"},{"key":"bibr6-10943420241286476","doi-asserted-by":"crossref","unstructured":"Chung I, Hollingsworth JK (2006) A case study using automatic performance tuning for large-scale scientific programs. In: Proceedings of the 15th IEEE international symposium on high performance distributed computing (HPDC\u201906), Paris, France, 19\u201323 June 2006.","DOI":"10.1109\/HPDC.2006.1652135"},{"key":"bibr7-10943420241286476","unstructured":"ConfigSpace (2023) https:\/\/github.com\/automl\/ConfigSpace."},{"key":"bibr8-10943420241286476","unstructured":"Cray XC40 Theta (2022) Argonne National Laboratory. https:\/\/www.alcf.anl.gov\/theta."},{"volume-title":"Frontier TDS System","year":"2023","author":"Crusher","key":"bibr9-10943420241286476"},{"key":"bibr10-10943420241286476","unstructured":"ECP (2023) U.S. DOE exascale computing project. https:\/\/www.exascaleproject.org."},{"key":"bibr11-10943420241286476","unstructured":"ECP Proxy Applications Suite (2022) https:\/\/proxyapps.exascaleproject.org\/ecp-proxy-apps-suite\/."},{"key":"bibr12-10943420241286476","unstructured":"ExaSMR (2023) ExaSMR. https:\/\/www.exascaleproject.org\/research-project\/exasmr\/."},{"key":"bibr13-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.4029"},{"key":"bibr14-10943420241286476","unstructured":"Ferran Pousa A, Jalas S, Kirchen M, et al. (2022) Multitask optimization of laser-plasma accelerators using simulation codes with different fidelities. In: Proceedings of the 13th international particle accelerator conference, Bangkok, Thailand, 12\u201317 June 2022."},{"volume-title":"HPE Cray EX System Frontier","year":"2024","author":"Frontier","key":"bibr15-10943420241286476"},{"key":"bibr16-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1551"},{"key":"bibr17-10943420241286476","unstructured":"Hoogenboom JE, Martin WR, Petrovic B (2011) The Monte Carlo performance benchmark test\u2014aims, specifications, and first results. In: Proceedings of int. conf. on mathematics and computational methods applied to nuclear science and engineering. Rio de Janeiro, Brazil, 2011."},{"key":"bibr18-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2021.3082815"},{"key":"bibr19-10943420241286476","doi-asserted-by":"publisher","DOI":"10.21105\/joss.06031"},{"volume-title":"libEnsemble Users Manual. Version 1.1.0","year":"2024","author":"Hudson S","key":"bibr20-10943420241286476"},{"key":"bibr21-10943420241286476","unstructured":"Katarzynski J, Cytowski M (2014) Towards autotuning of OpenMP applications on multicore architectures. https:\/\/arxiv.org\/abs\/1401.4063."},{"key":"bibr22-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-13374-9_21"},{"key":"bibr23-10943420241286476","doi-asserted-by":"crossref","unstructured":"Liu Y, Sid-Lakhdar WM, Marques O, et al. (2021) GPTune: multitask learning for autotuning exascale applications. In: Proceedings of the 26th ACM SIGPLAN symposium on principles and practice of parallel programming(PPoPP\u201921), Republic of Korea, 27 February\u20133 March 2021, pp. 234\u2013246.","DOI":"10.1145\/3437801.3441621"},{"key":"bibr24-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-92040-5_3"},{"key":"bibr25-10943420241286476","doi-asserted-by":"crossref","unstructured":"Muralidharan S, Shantharam M, Hall M, et al. (2014) Nitro: a framework for adaptive code variant tuning. In: Proceedings of the 2014 IEEE 28th international parallel and distributed processing symposium (IPDPS\u201914), Phoenix, Arizona, 19\u201323 May 2014.","DOI":"10.1109\/IPDPS.2014.59"},{"key":"bibr26-10943420241286476","doi-asserted-by":"crossref","unstructured":"Mustafa D, Aurangzeb A, Eigenmann R (2011) Performance analysis and tuning of automatically parallelized OpenMP applications. In: Proceedings of the 7th international conference on OpenMP in the petascale era, Chicago, IL, June 2011.","DOI":"10.1007\/978-3-642-21487-5_12"},{"key":"bibr27-10943420241286476","unstructured":"Neveu N, Hudson S, Larson J, et al. (2019) Comparison of model-based and heuristic optimization algorithms applied to photoinjectors using libEnsemble. In: Proceedings of the 13th international computational accelerator physics conference, Key West, FL, 20\u201324 October 2018, pp. 22\u201324."},{"key":"bibr28-10943420241286476","doi-asserted-by":"crossref","unstructured":"Ogilvie WF, Petoumenos P, Wang Z, et al. (2017) Minimizing the cost of iterative compilation with active learning. In: Proceedings. of the 2017 international symposium on code generation and optimization, Austin, TX, 4\u20138 February 2017.","DOI":"10.1109\/CGO.2017.7863744"},{"key":"bibr29-10943420241286476","unstructured":"OpenMC (2022) https:\/\/openmc.org. https:\/\/github.com\/jtramm\/openmc_offloading_builder\/tree\/main."},{"key":"bibr30-10943420241286476","unstructured":"PETSC\/TAO (2023) ECP PETSC\/TAO. https:\/\/www.exascaleproject.org\/research-project\/petsc-tao\/."},{"key":"bibr31-10943420241286476","unstructured":"PROTEAS-TUNE (2023) Ecp proteas-tune. https:\/\/www.exascaleproject.org\/research-project\/proteas-tune\/."},{"key":"bibr32-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1016\/j.anucene.2014.07.048"},{"key":"bibr33-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1145\/3453483.3454109"},{"key":"bibr34-10943420241286476","unstructured":"Slurm (2022) Slurm workload manager. https:\/\/slurm.schedmd.com."},{"key":"bibr35-10943420241286476","doi-asserted-by":"crossref","unstructured":"Sreenivasan V, Javali R, Hall M, et al. (2019) A framework for enabling OpenMP autotuning. In: Proceedings of OpenMP: conquering the full hardware spectrum (IWOMP\u201919), Auckland, New Zealand, 11\u201313 September 2019.","DOI":"10.1007\/978-3-030-28596-8_4"},{"key":"bibr36-10943420241286476","unstructured":"Subprocess Management (2022) subprocess.Popen.communicate. https:\/\/docs.python.org\/3\/library\/subprocess.html."},{"key":"bibr37-10943420241286476","unstructured":"Summit (2022) Oak Ridge National Laboratory. https:\/\/www.olcf.ornl.gov\/olcf-resources\/compute-systems\/summit\/."},{"volume-title":"IBM Power9 Heterogeneous System Summit","year":"2023","author":"Summit","key":"bibr38-10943420241286476"},{"key":"bibr39-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2002.10062"},{"volume-title":"Cray XC40 Theta","year":"2023","author":"Theta","key":"bibr40-10943420241286476"},{"key":"bibr41-10943420241286476","doi-asserted-by":"crossref","unstructured":"Tiwari A, Hollingsworth JK (2011) Online adaptive code generation and tuning. In: Proceedings of the 2011 IEEE international parallel & distributed processing symposium (IPDPS\u201911), Anchorage, Alaska, 16\u201320 May 2011.","DOI":"10.1109\/IPDPS.2011.86"},{"key":"bibr42-10943420241286476","doi-asserted-by":"crossref","unstructured":"Tiwari A, Chen C, Chame J, et al. (2009) A scalable auto-tuning framework for compiler optimization. In: Proceedings of the 23rd IEEE international parallel and distributed computing symposium (IPDPS\u201909), Rome, 23\u201329 May 2009.","DOI":"10.1109\/IPDPS.2009.5161054"},{"key":"bibr43-10943420241286476","doi-asserted-by":"crossref","unstructured":"Tramm JR, Romano PK, Doerfert J, et al. (2022) Toward portable GPU acceleration of the OpenMC Monte Carlo particle transport code. In: PHYSOR 2022 \u2013 international conference on physics of reactors, Pittsburgh, Pennsylvania, 15\u201320 May 2022.","DOI":"10.13182\/PHYSOR22-37847"},{"key":"bibr44-10943420241286476","doi-asserted-by":"crossref","unstructured":"Whaley RC, Dongarra J (1998) Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM\/IEEE conference on supercomputing (SC\u201998). https:\/\/dl.acm.org\/citation.cfm?id=509058.509096.","DOI":"10.1109\/SC.1998.10004"},{"key":"bibr45-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1109\/PMBS54543.2021.00017"},{"key":"bibr46-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER49012.2020.00068"},{"key":"bibr47-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.6683"},{"key":"bibr48-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.6683"},{"key":"bibr49-10943420241286476","unstructured":"Wu X, Balaprakash P, Kruse M, et al. (2023a) ytopt: autotuning scientific applications for energy efficiency at large scales. In: Proceedings of cray user group conference 2023. Helsinki, Finland, 7\u201311 May 2023."},{"key":"bibr50-10943420241286476","doi-asserted-by":"publisher","DOI":"10.1145\/3624062.3626079"},{"key":"bibr51-10943420241286476","unstructured":"ytopt (2023) ytopt: a machine-learning-based autotuning software package. https:\/\/github.com\/ytopt-team\/ytopt."}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241286476","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420241286476","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241286476","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241286476","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T09:53:58Z","timestamp":1740822838000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420241286476"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,7]]},"references-count":51,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["10.1177\/10943420241286476"],"URL":"https:\/\/doi.org\/10.1177\/10943420241286476","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2024,10,7]]}}}