{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T14:40:03Z","timestamp":1760712003922,"version":"build-2065373602"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"name":"Institute of Information & communications Technology Planning & Evaluation (IITP) through grants from the Korea government","award":["RS-2024-00459026 and RS-2024-00456287"],"award-info":[{"award-number":["RS-2024-00459026 and RS-2024-00456287"]}]},{"DOI":"10.13039\/100004358","name":"Samsung Electronics Co., Ltd","doi-asserted-by":"crossref","award":["IO221213-04119-01"],"award-info":[{"award-number":["IO221213-04119-01"]}],"id":[{"id":"10.13039\/100004358","id-type":"DOI","asserted-by":"crossref"}]},{"name":"IITP through a grant from the Korean government","award":["No.2022-0-00991"],"award-info":[{"award-number":["No.2022-0-00991"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>Accurate estimation of performance and energy consumption is critical for optimizing application efficiency on diverse hardware platforms. Traditional methods often rely on profiling and measurements, requiring at least one execution, making them time-consuming and resource-intensive. This article introduces the Deep Power Meter (DeepPM) framework, leveraging deep learning, specifically the Transformer architecture, to predict performance and energy consumption of basic blocks directly from compiled binaries, eliminating the need for explicit measurement processes. The DeepPM model effectively learns the performance and energy consumption of basic blocks, enabling accurate predictions for each. Furthermore, the framework enhances applicability across different ISAs and microarchitectures, addressing limitations of state-of-the-art ML-based techniques restricted to specific processor architectures. Experimental results using the SPEC CPU 2017 benchmark suite show that DeepPM achieves significantly lower prediction errors compared to state-of-the-art ML-based techniques, with a 24% improvement in performance and an 18% improvement in energy consumption for x86 basic blocks, and similar gains for ARM processors. Fine-tuning with minimal data from the Phoronix Test Suite further validates DeepPM\u2019s robustness, achieving an error of approximately 13.7%, close to the fully trained model\u2019s 13.3% error. These findings demonstrate DeepPM\u2019s ability to enhance the accuracy and efficiency of performance and energy consumption predictions, making it a valuable tool for optimizing computing systems across diverse hardware environments.<\/jats:p>\n          <jats:p\/>","DOI":"10.1145\/3725887","type":"journal-article","created":{"date-parts":[[2025,3,28]],"date-time":"2025-03-28T05:46:39Z","timestamp":1743140799000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["DeepPM: Predicting Performance and Energy Consumption of Program Binaries Using Transformers"],"prefix":"10.1145","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-9630-321X","authenticated-orcid":false,"given":"Jun S.","family":"Shim","sequence":"first","affiliation":[{"name":"Dept. of Computer Science and Engineering, Seoul National University","place":["Seoul, Korea (the Republic of)"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-6861-4377","authenticated-orcid":false,"given":"Hyeonji","family":"Chang","sequence":"additional","affiliation":[{"name":"Dept. of Computer Science and Engineering, Seoul National University","place":["Seoul, Korea (the Republic of)"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5947-9632","authenticated-orcid":false,"given":"Yeseong","family":"Kim","sequence":"additional","affiliation":[{"name":"Electrical Engineering and Computer Science, DGIST","place":["Daegu, Korea (the Republic of)"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7977-9883","authenticated-orcid":false,"given":"Jihong","family":"Kim","sequence":"additional","affiliation":[{"name":"Dept. of Computer Science and Engineering, Seoul National University","place":["Seoul, Korea (the Republic of)"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,17]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"National Security Agency. 2023. Ghidra. GitHub. Retrieved March 31 2025 from https:\/\/github.com\/NationalSecurityAgency\/ghidra"},{"key":"e_1_3_2_3_2","first-page":"1","volume-title":"Proceedings of the 2014 Design, Automation and Test in Europe Conference and Exhibition (DATE)","author":"Alam Faisal","year":"2014","unstructured":"Faisal Alam, Preeti Ranjan Panda, Nikhil Tripathi, Namita Sharma, and Sanjiv Narayan. 2014. Energy optimization in android applications through wakelock placement. In Proceedings of the 2014 Design, Automation and Test in Europe Conference and Exhibition (DATE). IEEE, 1\u20134."},{"key":"e_1_3_2_4_2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et\u00a0al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877\u20131901.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_5_2","first-page":"167","volume-title":"Proceedings of the 2019 IEEE International Symposium on Workload Characterization (IISWC)","author":"Chen Yishen","year":"2019","unstructured":"Yishen Chen, Ajay Brahmakshatriya, Charith Mendis, Alex Renda, Eric Atkinson, Ond\u0159ej S\u1ef3kora, Saman Amarasinghe, and Michael Carbin. 2019. BHive: A benchmark suite and measurement framework for validating x86-64 basic block performance models. In Proceedings of the 2019 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 167\u2013177."},{"key":"e_1_3_2_6_2","volume-title":"Intel 64 and IA-32 Architectures Software Developer\u2019s Manuals","author":"Corporation Intel","year":"2024","unstructured":"Intel Corporation. 2024. Intel 64 and IA-32 Architectures Software Developer\u2019s Manuals. Retrieved March 31, 2025 from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/intel-sdm.html"},{"key":"e_1_3_2_7_2","unstructured":"Standard Performance Evaluation Corporation. 2017. SPEC CPU 2017 Benchmark. Retrieved March 31 2025 from https:\/\/www.spec.org\/cpu2017\/"},{"issue":"5","key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"1311","DOI":"10.1109\/TC.2014.2315629","article-title":"A stochastic model for estimating the power consumption of a processor","volume":"64","author":"Dargie Waltenegus","year":"2014","unstructured":"Waltenegus Dargie. 2014. A stochastic model for estimating the power consumption of a processor. IEEE Transactions on Computers 64, 5 (2014), 1311\u20131322.","journal-title":"IEEE Transactions on Computers"},{"key":"e_1_3_2_9_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 1 (Long and Short Papers) 4171\u20134186."},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","first-page":"104794","DOI":"10.1016\/j.jpdc.2023.104794","article-title":"Vampire: A smart energy meter for synchronous monitoring in a distributed computer system","volume":"184","author":"D\u00edaz Antonio F.","year":"2024","unstructured":"Antonio F. D\u00edaz, Beatriz Prieto, Juan Jos\u00e9 Escobar, and Thomas Lampert. 2024. Vampire: A smart energy meter for synchronous monitoring in a distributed computer system. Journal of Parallel and Distributed Computing 184 (2024), 104794.","journal-title":"Journal of Parallel and Distributed Computing"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.iot.2022.100655"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2425248.2425252"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297280.3297338"},{"key":"e_1_3_2_14_2","first-page":"2790","volume-title":"Proceedings of the 36th International Conference on Machine Learning (PMLR)","author":"Houlsby Neil","year":"2019","unstructured":"Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In Proceedings of the 36th International Conference on Machine Learning (PMLR). PMLR, 2790\u20132799."},{"key":"e_1_3_2_15_2","unstructured":"Edward J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang Weizhu Chen and others. 2022. LoRA: Low-rank adaptation of large language models. ICLR 1 2 (2022) 3."},{"key":"e_1_3_2_16_2","volume-title":"MAX78000","author":"Integrated Maxim","year":"2021","unstructured":"Maxim Integrated. 2021. MAX78000. Retrieved March 31, 2025 from https:\/\/www.maximintegrated.com\/en\/products\/microcontrollers\/MAX78000.html"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/337292.337425"},{"key":"e_1_3_2_18_2","first-page":"1126","volume-title":"Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition (DATE)","author":"Lee Dongwook","year":"2015","unstructured":"Dongwook Lee, Lizy K. John, and Andreas Gerstlauer. 2015. Dynamic power and performance back-annotation for fast and accurate functional hardware simulation. In Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition (DATE). IEEE, 1126\u20131131."},{"key":"e_1_3_2_19_2","first-page":"469","volume-title":"Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Li Sheng","year":"2009","unstructured":"Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). 469\u2013480."},{"key":"e_1_3_2_20_2","first-page":"1","article-title":"Estimation of energy consumption through parallel computing in wireless sensor networks","author":"Lounis Massinissa","year":"2024","unstructured":"Massinissa Lounis, Ahc\u00e8ne Bounceur, Reinhardt Euler, and Bernard Pottier. 2024. Estimation of energy consumption through parallel computing in wireless sensor networks. Journal of Ambient Intelligence and Humanized Computing 15, 2 (2024), 1\u201313.","journal-title":"Journal of Ambient Intelligence and Humanized Computing"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1064978.1065034"},{"key":"e_1_3_2_22_2","unstructured":"Phoronix Media. 2022. Phoronix Test Suite. Retrieved March 31 2025 from https:\/\/www.phoronix-test-suite.com\/"},{"key":"e_1_3_2_23_2","first-page":"4505","volume-title":"Proceedings of the International Conference on Machine Learning (PMLR)","author":"Mendis Charith","year":"2019","unstructured":"Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2019. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. In Proceedings of the International Conference on Machine Learning (PMLR). PMLR, 4505\u20134515."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511094"},{"key":"e_1_3_2_25_2","unstructured":"OpenOCD. 2023. Open On-Chip Debugger. Retrieved March 31 2025 from https:\/\/openocd.org\/"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00032"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/337292.337786"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2555486.2555491"},{"key":"e_1_3_2_29_2","first-page":"1","volume-title":"Proceedings of the 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)","author":"Schmitt Norbert","year":"2021","unstructured":"Norbert Schmitt, Supriya Kamthania, Nishant Rawtani, Luis Mendoza, Klaus-Dieter Lange, and Samuel Kounev. 2021. Energy-efficiency comparison of common sorting algorithms. In Proceedings of the 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 1\u20138."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/54.914596"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/1577129.1577137"},{"key":"e_1_3_2_32_2","first-page":"495","volume-title":"Proceedings of the 2024 IEEE 24th International Conference on Nanotechnology (NANO)","author":"TaheriNejad Nima","year":"2024","unstructured":"Nima TaheriNejad. 2024. In-memory computing: Global energy consumption, carbon footprint, technology, and products status quo. In Proceedings of the 2024 IEEE 24th International Conference on Nanotechnology (NANO). IEEE, 495\u2013500."},{"key":"e_1_3_2_33_2","first-page":"5998","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_2_34_2","first-page":"382","volume-title":"Proceedings of the 2012 Design, Automation and Test in Europe Conference and Exhibition (DATE)","author":"Wang Zhonglei","year":"2012","unstructured":"Zhonglei Wang and J\u00f6rg Henkel. 2012. Accurate source-level simulation of embedded software with respect to compiler optimizations. In Proceedings of the 2012 Design, Automation and Test in Europe Conference and Exhibition (DATE). IEEE, 382\u2013387."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480064"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2016.2578882"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3725887","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T14:02:42Z","timestamp":1760709762000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3725887"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,17]]},"references-count":35,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3725887"],"URL":"https:\/\/doi.org\/10.1145\/3725887","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2025,10,17]]},"assertion":[{"value":"2024-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-10","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}