{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,3]],"date-time":"2025-06-03T06:05:35Z","timestamp":1748930735676,"version":"3.40.3"},"publisher-location":"Cham","reference-count":23,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031695766"},{"type":"electronic","value":"9783031695773"}],"license":[{"start":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T00:00:00Z","timestamp":1704067200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,26]],"date-time":"2024-08-26T00:00:00Z","timestamp":1724630400000},"content-version":"vor","delay-in-days":238,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Leveraging hardware performance counters provides valuable insights into system resource utilization, aiding performance analysis and tuning for parallel applications. The available counters vary with architecture and are collected at execution time. Their abundance and the limited number of registers for measurement make gathering laborious and costly. Efficient characterization of parallel regions necessitates a dimension reduction strategy. While recent efforts have focused on manually reducing the number of counters for specific architectures, this paper introduces a novel approach: an automatic dimension reduction technique for efficiently characterizing parallel code regions across diverse architectures. The methodology is based on Machine Learning ensembles because of their precision and ability at capturing different relationships between the input features and the target variables. Evaluation results show that ensembles can successfully reduce the number of hardware performance counters that characterize a code region. We validate our approach on CPUs using a comprehensive dataset of OpenMP regions, showing that any region can be accurately characterized by 8 relevant hardware performance counters. In addition, we also apply the proposed methodology on GPUs using a reduced set of kernels, demonstrating its effectiveness across various hardware configurations and workloads.<\/jats:p>","DOI":"10.1007\/978-3-031-69577-3_2","type":"book-chapter","created":{"date-parts":[[2024,8,25]],"date-time":"2024-08-25T19:02:05Z","timestamp":1724612525000},"page":"18-32","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Efficient Code Region Characterization Through Automatic Performance Counters Reduction Using Machine Learning Techniques"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2224-5730","authenticated-orcid":false,"given":"Suren","family":"Harutyunyan","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9729-8557","authenticated-orcid":false,"given":"Eduardo","family":"C\u00e9sar","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0090-4109","authenticated-orcid":false,"given":"Anna","family":"Sikora","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5703-9673","authenticated-orcid":false,"given":"Ji\u0159\u00ed","family":"Filipovi\u010d","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0947-1182","authenticated-orcid":false,"given":"Akash","family":"Dutta","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8672-5317","authenticated-orcid":false,"given":"Ali","family":"Jannesari","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9640-6763","authenticated-orcid":false,"given":"Jordi","family":"Alcaraz","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,8,26]]},"reference":[{"key":"2_CR1","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1007\/978-3-030-29400-7_6","volume-title":"Euro-Par 2019: Parallel Processing","author":"J Alcaraz","year":"2019","unstructured":"Alcaraz, J., Sikora, A., C\u00e9sar, E.: Hardware counters\u2019 space reduction for code region characterization. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 74\u201386. Springer, Cham (2019). https:\/\/doi.org\/10.1007\/978-3-030-29400-7_6"},{"key":"2_CR2","doi-asserted-by":"crossref","unstructured":"Alcaraz, J., et al.: Predicting number of threads using balanced datasets for openmp regions. PDP Spec. Issue 2021 Comput. 105(5), 999\u20131017 (2023)","DOI":"10.1007\/s00607-022-01081-6"},{"key":"2_CR3","unstructured":"Arik, S.\u00d6., Pfister, T.: Tabnet: attentive interpretable tabular learning. arXiv preprint arXiv:1908.07442 (2019)"},{"key":"2_CR4","doi-asserted-by":"crossref","unstructured":"Breiman, L.: Random forests. Mach. Learn. 45, 5\u201332 (2001)","DOI":"10.1023\/A:1010933404324"},{"key":"2_CR5","doi-asserted-by":"crossref","unstructured":"Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the ACM\/IEEE Conference on SC13, Denver, pp. 1\u201312. ACM (2013)","DOI":"10.1145\/2503210.2503277"},{"key":"2_CR6","doi-asserted-by":"crossref","unstructured":"Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 16), pp. 785\u2013794. ACM, New York (2016)","DOI":"10.1145\/2939672.2939785"},{"issue":"2","key":"2_CR7","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1111\/j.2517-6161.1958.tb00292.x","volume":"20","author":"DR Cox","year":"1958","unstructured":"Cox, D.R.: The regression analysis of binary sequences. J. Roy. Stat. Soc.: Ser. B (Methodol.) 20(2), 215\u2013232 (1958)","journal-title":"J. Roy. Stat. Soc.: Ser. B (Methodol.)"},{"key":"2_CR8","doi-asserted-by":"crossref","unstructured":"Dutta, A., Alcaraz, J., TehraniJamsaz, A., Sikora, A., Cesar, E., Jannesari, A.: Pattern-based autotuning of openmp loops using graph neural networks. In: 2022 IEEE\/ACM International Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S), pp. 26\u201331 (2022)","DOI":"10.1109\/AI4S56813.2022.00010"},{"key":"2_CR9","doi-asserted-by":"crossref","unstructured":"Filipovi\u010d, J., Hozzov\u00e1, J.A.N., O\u013eha, J., Petrovi\u010d, F.: Using hardware performance counters to speed up autotuning convergence on gpus. J. Parall. Distrib. Comput. 160, 16\u201335 (2022)","DOI":"10.1016\/j.jpdc.2021.10.003"},{"issue":"1","key":"2_CR10","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1080\/00401706.1970.10488634","volume":"12","author":"AE Hoerl","year":"1970","unstructured":"Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55\u201367 (1970)","journal-title":"Technometrics"},{"key":"2_CR11","doi-asserted-by":"crossref","unstructured":"Kjeldsberg, P.G., Gocht, A., Gerndt, M., Riha, L., Schuchart, J., Mian, U.S.: Readex: linking two ends of the computing continuum to improve energy-efficiency in dynamic applications. In: Design, Automation Test in Europe Conference Exhibition, 2017, pp. 109\u2013114 (2017)","DOI":"10.23919\/DATE.2017.7926967"},{"key":"2_CR12","unstructured":"McCalpin, J.: Memory bandwidth and machine balance in high performance computers. In: IEEE Technical Committee on Computer Architecture Newsletter, pp. 19\u201325 (1995)"},{"key":"2_CR13","doi-asserted-by":"crossref","unstructured":"Miceli, R., et al.: Autotune: a plugin-driven approach to the automatic tuning of parallel applications. In: Proceedings of the 11th International Workshop on the State-of-the-Art in Scientific and Parallel Computing (PARA 2012), vol. 7782, 328\u2013342 (2013)","DOI":"10.1007\/978-3-642-36803-5_24"},{"key":"2_CR14","unstructured":"Mucci, P., Moore, S., Deane, C., Ho, G.: Papi: a portable interface to hardware performance counters (1999)"},{"key":"2_CR15","volume":"22","author":"F Petrovi\u010d","year":"2023","unstructured":"Petrovi\u010d, F., Filipovi\u010d, J.: Kernel tuning toolkit. SoftwareX 22, 101385 (2023)","journal-title":"Kernel tuning toolkit. SoftwareX"},{"key":"2_CR16","doi-asserted-by":"crossref","unstructured":"Petrovi\u010d, F., et al.: A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with kernel tuning toolkit. Futur. Gener. Comput. Syst. 108, 161\u2013177 (2020)","DOI":"10.1016\/j.future.2020.02.069"},{"issue":"3","key":"2_CR17","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/MCAS.2006.1688199","volume":"6","author":"R Polikar","year":"2006","unstructured":"Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21\u201345 (2006)","journal-title":"IEEE Circuits Syst. Mag."},{"key":"2_CR18","doi-asserted-by":"crossref","unstructured":"Popov, M., Akel, C., Chatelain, Y., Jalby, W., de\u00a0Oliveira\u00a0Castro, P.: Piecewise holistic autotuning of parallel programs with CERE. Concurr. Comput. Pract. Exp. 29 (2017)","DOI":"10.1002\/cpe.4190"},{"issue":"1","key":"2_CR19","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267\u2013288 (1996)","journal-title":"J. Roy. Stat. Soc.: Ser. B (Methodol.)"},{"key":"2_CR20","doi-asserted-by":"publisher","unstructured":"Wood, C., et al.: Artemis: automatic runtime tuning of\u00a0parallel execution parameters using\u00a0machine learning. In: Chamberlain, B.L., Varbanescu, A.-L., Ltaief, H., Luszczek, P. (eds.) ISC High Performance 2021. LNCS, vol. 12728, pp. 453\u2013472. Springer, Cham (2021). https:\/\/doi.org\/10.1007\/978-3-030-78713-4_24","DOI":"10.1007\/978-3-030-78713-4_24"},{"key":"2_CR21","unstructured":"Yuki, T., Pouchet, L.N.: Polybench 4.0 (2015). https:\/\/web.cse.ohio-state.edu\/~pouchet.2\/software\/polybench\/"},{"key":"2_CR22","unstructured":"Yuki, T.: Understanding PolyBench\/C 3.2 kernels. In: Rajopadhye, S., Verdoolaege, S. (eds.) Proceedings of the 4th International Workshop on Polyhedral Compilation Techniques, Vienna (2014)"},{"key":"2_CR23","doi-asserted-by":"crossref","unstructured":"Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B (Statist. Methodol.) 67(2), 301\u2013320 (2005)","DOI":"10.1111\/j.1467-9868.2005.00503.x"}],"container-title":["Lecture Notes in Computer Science","Euro-Par 2024: Parallel Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-69577-3_2","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,25]],"date-time":"2024-08-25T19:08:41Z","timestamp":1724612921000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-69577-3_2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"ISBN":["9783031695766","9783031695773"],"references-count":23,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-69577-3_2","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2024]]},"assertion":[{"value":"26 August 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"Euro-Par","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"European Conference on Parallel Processing","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Madrid","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Spain","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 August 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"30 August 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"30","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"europar2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/2024.euro-par.org\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}