{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T16:51:22Z","timestamp":1775667082580,"version":"3.50.1"},"reference-count":40,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2024,12,4]],"date-time":"2024-12-04T00:00:00Z","timestamp":1733270400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"Exascale Computing Project","award":["17-SC-20-SC"],"award-info":[{"award-number":["17-SC-20-SC"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:p> The Performance Application Programming Interface (PAPI) serves as a coherent, operating-system-independent interface for accessing performance counter data across a wide range of hardware and software components. PAPI can operate autonomously as a performance monitoring library and tool for application analysis. However, its true value emerges when it functions as a middleware for numerous third-party profiling, tracing, and sampling toolkits, establishing itself as a universal interface for hardware counter analysis. In this role, PAPI manages the intricacies of each hardware component, presenting a streamlined API to higher-level toolkits. Within the Exascale Computing Project (ECP), PAPI has expanded its capabilities in performance counter monitoring and incorporated support for power management across cutting-edge hardware and software technologies. This includes performance and power monitoring for AMD GPUs through integration with AMD ROCm and ROCm-SMI, Intel Ponte Vecchio GPUs via Intel\u2019s oneAPI Level Zero, and NVIDIA GPUs through the CUPTI Profiling API. Additionally, PAPI is compatible with interconnects, the latest CPUs, and ARM chips. These enhancements have been implemented while preserving the standard PAPI interface and methodology for utilizing low-level performance counters in CPUs, GPUs, on\/off-chip memory, interconnects, and the I\/O system, encompassing energy and power management. To strengthen PAPI\u2019s sustainability, ECP has facilitated its integration into Spack and E4S, ensuring software robustness through continuous integration and continuous deployment. In addition to hardware counter-based data, PAPI now supports the registration and monitoring of Software-Defined Events. This feature exposes the internal behavior of runtime systems and libraries like PaRSEC, SLATE, Magma, to applications utilizing those libraries, broadening the scope of performance events to include software-based information. Additionally, PAPI has been expanded with the Counter Analysis Toolkit, aiding in native performance counter disambiguation through micro-benchmarks. These micro-benchmarks probe various essential aspects of modern chips, contributing to the classification of raw performance events. In summary, ECP has enabled PAPI to include comprehensive counter analysis capabilities, advanced performance and power monitoring support for exascale hardware components, and broadened the scope of performance events to encompass not only hardware-related metrics but also software-based information. <\/jats:p>","DOI":"10.1177\/10943420241303884","type":"journal-article","created":{"date-parts":[[2024,12,4]],"date-time":"2024-12-04T10:51:15Z","timestamp":1733309475000},"page":"251-268","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":10,"title":["Advancements of PAPI for the exascale generation"],"prefix":"10.1177","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8173-9434","authenticated-orcid":false,"given":"Heike","family":"Jagode","sequence":"first","affiliation":[{"name":"Innovative Computing Laboratory (ICL), University of Tennessee, Knoxville, TN, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9846-0066","authenticated-orcid":false,"given":"Anthony","family":"Danalis","sequence":"additional","affiliation":[{"name":"Innovative Computing Laboratory (ICL), University of Tennessee, Knoxville, TN, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7165-7591","authenticated-orcid":false,"given":"Giuseppe","family":"Congiu","sequence":"additional","affiliation":[{"name":"Innovative Computing Laboratory (ICL), University of Tennessee, Knoxville, TN, USA"}]},{"given":"Daniel","family":"Barry","sequence":"additional","affiliation":[{"name":"Innovative Computing Laboratory (ICL), University of Tennessee, Knoxville, TN, USA"}]},{"given":"Anthony","family":"Castaldo","sequence":"additional","affiliation":[{"name":"Synopsys Inc, Sunnyvale, CA, USA"}]},{"given":"Jack","family":"Dongarra","sequence":"additional","affiliation":[{"name":"Innovative Computing Laboratory (ICL), University of Tennessee, Knoxville, TN, USA"}]}],"member":"179","published-online":{"date-parts":[[2024,12,4]]},"reference":[{"key":"bibr1-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/180\/1\/012037"},{"key":"bibr2-10943420241303884","unstructured":"AMD (2024a) AMD\u2019s Radeon open compute platform. https:\/\/rocm.docs.amd.com\/en\/latest\/."},{"key":"bibr3-10943420241303884","unstructured":"AMD (2024b) AMD\u2019s ROCm system management interface. https:\/\/rocm.docs.amd.com\/projects\/rocm_smi_lib\/en\/latest\/."},{"key":"bibr4-10943420241303884","unstructured":"AMD (2024c) hipBLAS: basic linear algebra on AMD GPUs. https:\/\/rocm.docs.amd.com\/projects\/hipBLAS\/en\/latest\/."},{"key":"bibr5-10943420241303884","unstructured":"AMD (2024d) Omnitrace: application profiling, tracing, and analysis. https:\/\/github.com\/AMDResearch\/omnitrace."},{"key":"bibr6-10943420241303884","doi-asserted-by":"crossref","unstructured":"Barry D, Danalis A, Jagode H (2021) Effortless monitoring of arithmetic intensity with PAPI\u2019s counter analysis toolkit. In: Tools for High Performance Computing 2018\/2019, Dresden, Germany, 195\u2013218.","DOI":"10.1007\/978-3-030-66057-4_11"},{"key":"bibr7-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW59300.2023.00070"},{"key":"bibr8-10943420241303884","doi-asserted-by":"crossref","unstructured":"Barry D, Danalis A, Dongarra J (2024) Automated data analysis for defining performance metrics from raw hardware events. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), San Francisco, CA, USA.","DOI":"10.1109\/IPDPSW63119.2024.00134"},{"key":"bibr9-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.46"},{"key":"bibr10-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2013.98"},{"key":"bibr11-10943420241303884","doi-asserted-by":"publisher","DOI":"10.3233\/978-1-61499-324-7-65"},{"key":"bibr12-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-09766-4_60"},{"key":"bibr13-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2019.00069"},{"key":"bibr14-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807623"},{"key":"bibr15-10943420241303884","doi-asserted-by":"crossref","unstructured":"Gates M, Kurzak J, Charara A, et al. (2019) Slate: design of a modern distributed and accelerated linear algebra library. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1\u201318.","DOI":"10.1145\/3295500.3356223"},{"key":"bibr16-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1556"},{"key":"bibr17-10943420241303884","doi-asserted-by":"crossref","unstructured":"Haidar A, Jagode H, YarKhan A, et al. (2017) Power-aware computing: measurement, control, and performance analysis for intel Xeon Phi. In: 2017 IEEE High Performance Extreme Computing Conference (HPEC \u201917), Waltham, MA, USA, 12\u201314 September 2017.","DOI":"10.1109\/HPEC.2017.8091085"},{"key":"bibr18-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.4485"},{"key":"bibr19-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1109\/ESPM256814.2022.00008"},{"key":"bibr20-10943420241303884","unstructured":"Intel (2024a) Intel VTune profiler. https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/tools\/oneapi\/vtune-profiler.html."},{"key":"bibr21-10943420241303884","unstructured":"Intel (2024b) Intel\u2019s oneAPI level zero interface. https:\/\/spec.oneapi.io\/level-zero\/latest\/index.html."},{"key":"bibr22-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1109\/ISPA.2008.136"},{"key":"bibr23-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-39589-0_4"},{"key":"bibr24-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1177\/1094342019846287"},{"key":"bibr25-10943420241303884","first-page":"213","volume-title":"Proceedings of the International Supercomputing Conference 2013","author":"Jagode-McCraw H","year":"2013"},{"key":"bibr26-10943420241303884","doi-asserted-by":"crossref","unstructured":"Jagode-McCraw H, Ralph J, Danalis A, et al. (2014) Power monitoring with PAPI for Extreme scale architectures and dataflow-based programming Models. In: Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications (HPCMASPA 2014), Madrid, Spain, 22\u201326 September 2014, pp. 385\u2013391. IEEE Cluster 2014.","DOI":"10.1109\/CLUSTER.2014.6968672"},{"key":"bibr27-10943420241303884","first-page":"1","volume-title":"Proceedings of the Cray User Group 2003","author":"Kaufmann S","year":"2003"},{"key":"bibr28-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1021\/acs.chemrev.0c00998"},{"key":"bibr29-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2011.71"},{"key":"bibr30-10943420241303884","unstructured":"Molnar I (2009) perf: Linux profiling with performance counters. https:\/\/perf.wiki.kernel.org\/."},{"key":"bibr31-10943420241303884","unstructured":"NVIDIA (2024a) cuBLAS: basic linear algebra on NVIDIA GPUs. https:\/\/developer.nvidia.com\/cublas."},{"key":"bibr32-10943420241303884","unstructured":"NVIDIA (2024b) NVIDIA Nsight systems. https:\/\/developer.nvidia.com\/nsight-systems."},{"key":"bibr33-10943420241303884","unstructured":"NVIDIA (2024c) NVIDIA\u2019s CUDA profiling tools interface. https:\/\/developer.nvidia.com\/cupti."},{"key":"bibr34-10943420241303884","unstructured":"NVIDIA (2024d) NVIDIA\u2019s Nsight perf SDK. https:\/\/developer.nvidia.com\/nsight-perf-sdk."},{"key":"bibr35-10943420241303884","doi-asserted-by":"publisher","DOI":"10.3233\/978-1-61499-381-0-773"},{"key":"bibr36-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1177\/1094342006064482"},{"key":"bibr37-10943420241303884","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11261-4_11"},{"key":"bibr38-10943420241303884","doi-asserted-by":"crossref","unstructured":"Treibig J, Hager G, Wellein G (2010) LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proc. Of the First International Workshop on Parallel Software Tools and Tool Infrastructures.","DOI":"10.1109\/ICPPW.2010.38"},{"key":"bibr39-10943420241303884","doi-asserted-by":"crossref","unstructured":"Vanecek S, Schulz M (2023) Sys-sage: a fresh view on dynamic topologies & attributes of HPC systems. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Poster Session.","DOI":"10.1145\/3650200.3656627"},{"key":"bibr40-10943420241303884","doi-asserted-by":"crossref","unstructured":"Willenbring J, Shende S, Spear W, et al. (2023) E4S: extreme-scale scientific software stack. https:\/\/www.osti.gov\/biblio\/2432176.","DOI":"10.2172\/2432176"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241303884","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420241303884","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420241303884","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T18:13:37Z","timestamp":1741112017000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420241303884"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,4]]},"references-count":40,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["10.1177\/10943420241303884"],"URL":"https:\/\/doi.org\/10.1177\/10943420241303884","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,4]]}}}