{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:04:36Z","timestamp":1750309476833,"version":"3.41.0"},"reference-count":16,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,7,30]],"date-time":"2024-07-30T00:00:00Z","timestamp":1722297600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMOD Rec."],"published-print":{"date-parts":[[2024,7,30]]},"abstract":"<jats:p>We consider the problem of fine-grained hardware profiling, i.e., profiling the hardware while the desired section of the program is executing. Although this requirement is frequently encountered in practice, its importance has not been emphasized in literature so far. In this work, we compare and validate three tools for performing fine-grained profiling on Linux platforms - perf, PAPI, and a homegrown tool PMU-metrics. perf has been used in the past for fine-grained profiling in an erroneous manner, producing inaccurate metrics as a result. On the other hand, PAPI and PMU-metrics produce accurate metrics for profiling at thems-scale, while PMUmetrics enables profiling even at the \u00b5s-scale. Thus, we hope that our analysis will help systems practitioners choose the right tool for performing fine-grained profiling at different time scales.<\/jats:p>","DOI":"10.1145\/3685980.3685986","type":"journal-article","created":{"date-parts":[[2024,7,31]],"date-time":"2024-07-31T10:27:34Z","timestamp":1722421654000},"page":"38-43","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Fine-Grained Hardware Profiling - Are You Using the Right Tools?"],"prefix":"10.1145","volume":"53","author":[{"given":"Aarati","family":"Kakaraparthy","sequence":"first","affiliation":[{"name":"University of Wisconsin, Madison, Madison, WI, USA"}]},{"given":"Jignesh M.","family":"Patel","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,7,31]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"BLARE codebase. github.com\/mush-zhang\/ Blare\/tree\/main\/original_codebase."},{"key":"e_1_2_1_2_1","unstructured":"clock gettime(3) -- Linux manual page. https:\/\/tinyurl.com\/yyvkc2wz."},{"key":"e_1_2_1_3_1","unstructured":"Counting CPU cycles with perf event in C. https:\/\/tinyurl.com\/46azwvn6."},{"key":"e_1_2_1_4_1","unstructured":"perf event source code. https:\/\/tinyurl.com\/2bc557nj."},{"key":"e_1_2_1_5_1","unstructured":"perf event open(2) -- Linux manual page. https:\/\/tinyurl.com\/29f64vsm."},{"key":"e_1_2_1_6_1","unstructured":"RDPMC -- Read Performance-Monitoring Counters. https:\/\/tinyurl.com\/6rc495ud."},{"key":"e_1_2_1_7_1","unstructured":"US Accidents dataset (2016--2019). https:\/\/tinyurl.com\/2n4rv5cd."},{"key":"e_1_2_1_8_1","unstructured":"Use Linux's high resolution clock -- clock gettime. https:\/\/tinyurl.com\/3emmhdm5."},{"key":"e_1_2_1_9_1","unstructured":"WRMSR -- Write to Model Specific Register. https:\/\/tinyurl.com\/nfund4fe."},{"volume-title":"https: \/\/github.com\/UWHustle\/pmu-metrics","year":"2022","key":"e_1_2_1_10_1","unstructured":"The PMU-metrics Library. https: \/\/github.com\/UWHustle\/pmu-metrics, 2022."},{"key":"e_1_2_1_11_1","volume-title":"Slides from Linux Kongress","author":"De Melo A. C.","year":"2010","unstructured":"A. C. De Melo. The New Linux perf tools. In Slides from Linux Kongress, volume 18, 2010."},{"key":"e_1_2_1_12_1","volume-title":"VIP Hashing -- Adapting to Skew in Popularity of Data on the Fly (extended version). arXiv","author":"Kakaraparthy A.","year":"2022","unstructured":"A. Kakaraparthy, J. M. Patel, B. P. Kroth, and K. Park. VIP Hashing -- Adapting to Skew in Popularity of Data on the Fly (extended version). arXiv, 2022."},{"key":"e_1_2_1_13_1","volume-title":"Proc. VLDB Endow., 11(13)","author":"Kersten T.","year":"2019","unstructured":"T. Kersten, V. Leis, A. Kemper, T. Neumann, A. Pavlo, and P. Boncz. Everything You Always Wanted to Know about Compiled and Vectorized Queries but Were Afraid to Ask. Proc. VLDB Endow., 11(13), 2019."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/2556549.2556555"},{"key":"e_1_2_1_15_1","volume-title":"Tools for High Performance Computing","author":"Terpstra D.","year":"2009","unstructured":"D. Terpstra, H. Jagode, H. You, and J. Dongarra. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009. Springer Berlin Heidelberg."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589297"}],"container-title":["ACM SIGMOD Record"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3685980.3685986","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3685980.3685986","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:26Z","timestamp":1750295846000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3685980.3685986"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,30]]},"references-count":16,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,7,30]]}},"alternative-id":["10.1145\/3685980.3685986"],"URL":"https:\/\/doi.org\/10.1145\/3685980.3685986","relation":{},"ISSN":["0163-5808"],"issn-type":[{"type":"print","value":"0163-5808"}],"subject":[],"published":{"date-parts":[[2024,7,30]]},"assertion":[{"value":"2024-07-31","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}