{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T10:23:14Z","timestamp":1771064594451,"version":"3.50.1"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2020,12,30]],"date-time":"2020-12-30T00:00:00Z","timestamp":1609286400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Federal Ministry of Education and Research of Germany","award":["01IH16007"],"award-info":[{"award-number":["01IH16007"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,3,31]]},"abstract":"<jats:p>Characterizing compute kernel execution behavior on GPUs for efficient task scheduling is a non-trivial task. We address this with a simple model enabling portable and fast predictions among different GPUs using only hardware-independent features. This model is built based on random forests using 189 individual compute kernels from benchmarks such as Parboil, Rodinia, Polybench-GPU, and SHOC. Evaluation of the model performance using cross-validation yields a median Mean Average Percentage Error (MAPE) of 8.86\u201352.0% for time and 1.84\u20132.94% for power prediction across five different GPUs, while latency for a single prediction varies between 15 and 108 ms.<\/jats:p>","DOI":"10.1145\/3431731","type":"journal-article","created":{"date-parts":[[2020,12,30]],"date-time":"2020-12-30T12:30:51Z","timestamp":1609331451000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4051-8950","authenticated-orcid":false,"given":"Lorenz","family":"Braun","sequence":"first","affiliation":[{"name":"Institute of Computer Engineering, Heidelberg University, Germany"}]},{"given":"Sotirios","family":"Nikas","sequence":"additional","affiliation":[{"name":"Engineering Mathematics and Computing Lab, Heidelberg University, Germany"}]},{"given":"Chen","family":"Song","sequence":"additional","affiliation":[{"name":"Engineering Mathematics and Computing Lab, Heidelberg University, Germany"}]},{"given":"Vincent","family":"Heuveline","sequence":"additional","affiliation":[{"name":"Engineering Mathematics and Computing Lab, Heidelberg University, Germany"}]},{"given":"Holger","family":"Fr\u00f6ning","sequence":"additional","affiliation":[{"name":"Institute of Computer Engineering, Heidelberg University, Germany"}]}],"member":"320","published-online":{"date-parts":[[2020,12,30]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2016.7482105"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the IEEE 15th International Symposium on Network Computing and Applications (NCA\u201916)","author":"Amaris M.","year":"2016","unstructured":"M. Amaris , R. Y. de Camargo , M. Dyab , A. Goldman , and D. Trystram . 2016. A comparison of GPU execution time prediction using machine learning and analytical modeling . In Proceedings of the IEEE 15th International Symposium on Network Computing and Applications (NCA\u201916) . IEEE, 326--333. DOI:https:\/\/doi.org\/10.1109\/NCA. 2016 .7778637 10.1109\/NCA.2016.7778637 M. Amaris, R. Y. de Camargo, M. Dyab, A. Goldman, and D. Trystram. 2016. A comparison of GPU execution time prediction using machine learning and analytical modeling. In Proceedings of the IEEE 15th International Symposium on Network Computing and Applications (NCA\u201916). IEEE, 326--333. DOI:https:\/\/doi.org\/10.1109\/NCA.2016.7778637"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2019.2904497"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837853.1693470"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 163--174","author":"Bakhoda Ali","year":"2009","unstructured":"Ali Bakhoda , George L. Yuan , Wilson W. L. Fung , Henry Wong , and Tor M. Aamodt . 2009. Analyzing CUDA workloads using a detailed GPU simulator . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 163--174 . DOI:https:\/\/doi.org\/10.1109\/ispass. 2009 .4919648 10.1109\/ispass.2009.4919648 Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, and Tor M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 163--174. DOI:https:\/\/doi.org\/10.1109\/ispass.2009.4919648"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375527.1375580"},{"key":"e_1_2_1_7_1","first-page":"045","article-title":"Performance metrics (error measures) in machine learning regression, forecasting and prognostics: Properties and typology","volume":"14","author":"Botchkarev Alexei","year":"2019","unstructured":"Alexei Botchkarev . 2019 . Performance metrics (error measures) in machine learning regression, forecasting and prognostics: Properties and typology . Interdisc. J. Info. Knowl. Manage. 14 (2019), 045 -- 076 . DOI:https:\/\/doi.org\/10.28945\/4184 10.28945\/4184 Alexei Botchkarev. 2019. Performance metrics (error measures) in machine learning regression, forecasting and prognostics: Properties and typology. Interdisc. J. Info. Knowl. Manage. 14 (2019), 045--076. DOI:https:\/\/doi.org\/10.28945\/4184","journal-title":"Interdisc. J. Info. Knowl. Manage."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/PMBS49563.2019.00014"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2576779.2576783"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 46th International Conference on Parallel Processing Workshops (ICPPW\u201917)","author":"Thomas","year":"2017","unstructured":"Thomas C. Carroll and Prudence W. H. Wong. 2017. An Improved abstract GPU model with data transfer . In Proceedings of the 46th International Conference on Parallel Processing Workshops (ICPPW\u201917) . IEEE, 113--120. DOI:https:\/\/doi.org\/10.1109\/ICPPW. 2017 .28 10.1109\/ICPPW.2017.28 Thomas C. Carroll and Prudence W. H. Wong. 2017. An Improved abstract GPU model with data transfer. In Proceedings of the 46th International Conference on Parallel Processing Workshops (ICPPW\u201917). IEEE, 113--120. DOI:https:\/\/doi.org\/10.1109\/ICPPW.2017.28"},{"key":"e_1_2_1_12_1","volume-title":"Talbot","author":"Cawley Gavin C.","year":"2010","unstructured":"Gavin C. Cawley and Nicola L. C . Talbot . 2010 . On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res . 11 (Aug. 2010), 2079--2107. Gavin C. Cawley and Nicola L. C. Talbot. 2010. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11 (Aug. 2010), 2079--2107."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2009.5306797"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the International Green Computing Conference and Workshops. 1--6. DOI:https:\/\/doi.org\/10","author":"Chen J.","year":"2011","unstructured":"J. Chen , Bin Li , Ying Zhang , L. Peng , and J. Peir . 2011. Statistical GPU power analysis using tree-based methods . In Proceedings of the International Green Computing Conference and Workshops. 1--6. DOI:https:\/\/doi.org\/10 .1109\/IGCC. 2011 .6008582 10.1109\/IGCC.2011.6008582 J. Chen, Bin Li, Ying Zhang, L. Peng, and J. Peir. 2011. Statistical GPU power analysis using tree-based methods. In Proceedings of the International Green Computing Conference and Workshops. 1--6. DOI:https:\/\/doi.org\/10.1109\/IGCC.2011.6008582"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS\u201913)","author":"Choi J. W.","year":"2013","unstructured":"J. W. Choi , D. Bedard , R. Fowler , and R. Vuduc . 2013. A roofline model of energy . In Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS\u201913) . 661--672. DOI:https:\/\/doi.org\/10.1109\/IPDPS. 2013 .77 10.1109\/IPDPS.2013.77 J. W. Choi, D. Bedard, R. Fowler, and R. Vuduc. 2013. A roofline model of energy. In Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS\u201913). 661--672. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2013.77"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2018.022071134"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3'10)","author":"Danalis Anthony","unstructured":"Anthony Danalis , Gabriel Marin , Collin McCurdy , Jeremy S. Meredith , Philip C. Roth , Kyle Spafford , Vinod Tipparaju , and Jeffrey S. Vetter . 2010. The scalable heterogeneous computing (SHOC) benchmark suite . In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3'10) . ACM, 63--74. DOI:https:\/\/doi.org\/10.1145\/1735688.1735702 10.1145\/1735688.1735702 Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S. Meredith, Philip C. Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S. Vetter. 2010. The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3'10). ACM, 63--74. DOI:https:\/\/doi.org\/10.1145\/1735688.1735702"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3337821.3337833"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the Conference on Innovative Parallel Computing (InPar\u201912)","author":"Grauer-Gray S.","year":"2012","unstructured":"S. Grauer-Gray , L. Xu , R. Searles , S. Ayalasomayajula , and J. Cavazos . 2012. Auto-tuning a high-level language targeted to GPU codes . In Proceedings of the Conference on Innovative Parallel Computing (InPar\u201912) . 1--10. DOI:https:\/\/doi.org\/10.1109\/InPar. 2012 .6339595 10.1109\/InPar.2012.6339595 S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, and J. Cavazos. 2012. Auto-tuning a high-level language targeted to GPU codes. In Proceedings of the Conference on Innovative Parallel Computing (InPar\u201912). 1--10. DOI:https:\/\/doi.org\/10.1109\/InPar.2012.6339595"},{"key":"e_1_2_1_20_1","volume-title":"Modern Compiler Design","author":"Grune Dick","unstructured":"Dick Grune , Kees van Reeuwijk , Henri E. Bal , Ceriel J. H. Jacobs , and Koen Langendoen . 2012. Modern Compiler Design . Springer , New York, NY . DOI:https:\/\/doi.org\/10.1007\/978-1-4614-4699-6 10.1007\/978-1-4614-4699-6 Dick Grune, Kees van Reeuwijk, Henri E. Bal, Ceriel J. H. Jacobs, and Koen Langendoen. 2012. Modern Compiler Design. Springer, New York, NY. DOI:https:\/\/doi.org\/10.1007\/978-1-4614-4699-6"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"Guerreiro J.","year":"2018","unstructured":"J. Guerreiro , A. Ilic , N. Roma , and P. Tomas . 2018. GPGPU power modeling for multi-domain voltage-frequency scaling . In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918) . IEEE, 789--800. DOI:https:\/\/doi.org\/10.1109\/HPCA. 2018 .00072 10.1109\/HPCA.2018.00072 J. Guerreiro, A. Ilic, N. Roma, and P. Tomas. 2018. GPGPU power modeling for multi-domain voltage-frequency scaling. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201918). IEEE, 789--800. DOI:https:\/\/doi.org\/10.1109\/HPCA.2018.00072"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2951218"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555754.1555775"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815998"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 47th IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201914)","author":"Huang Jen-Cheng","year":"2014","unstructured":"Jen-Cheng Huang , Joo Hwan Lee , Hyesoon Kim , and Hsien-Hsin S. Lee . 2014. GPUMech: GPU performance modeling technique based on interval analysis . In Proceedings of the 47th IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201914) . IEEE, 268--279. DOI:https:\/\/doi.org\/10.1109\/MICRO. 2014 .59 10.1109\/MICRO.2014.59 Jen-Cheng Huang, Joo Hwan Lee, Hyesoon Kim, and Hsien-Hsin S. Lee. 2014. GPUMech: GPU performance modeling technique based on interval analysis. In Proceedings of the 47th IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201914). IEEE, 268--279. DOI:https:\/\/doi.org\/10.1109\/MICRO.2014.59"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCS.2018.00095"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the IEEE International Parallel Distributed Processing Symposium Workshops (IPDPSW\u201914)","author":"Koike A.","year":"2014","unstructured":"A. Koike and K. Sadakane . 2014. A novel computational model for GPUs with application to I\/O optimal sorting algorithms . In Proceedings of the IEEE International Parallel Distributed Processing Symposium Workshops (IPDPSW\u201914) . 614--623. DOI:https:\/\/doi.org\/10.1109\/IPDPSW. 2014 .72 10.1109\/IPDPSW.2014.72 A. Koike and K. Sadakane. 2014. A novel computational model for GPUs with application to I\/O optimal sorting algorithms. In Proceedings of the IEEE International Parallel Distributed Processing Symposium Workshops (IPDPSW\u201914). 614--623. DOI:https:\/\/doi.org\/10.1109\/IPDPSW.2014.72"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201910)","author":"Kundu S.","year":"2010","unstructured":"S. Kundu , R. Rangaswami , K. Dutta , and M. Zhao . 2010. Application performance modeling in a virtualized environment . In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201910) . IEEE, 1--10. DOI:https:\/\/doi.org\/10.1109\/HPCA. 2010 .5463058 10.1109\/HPCA.2010.5463058 S. Kundu, R. Rangaswami, K. Dutta, and M. Zhao. 2010. Application performance modeling in a virtualized environment. In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201910). IEEE, 1--10. DOI:https:\/\/doi.org\/10.1109\/HPCA.2010.5463058"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-43659-3_7"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2611758"},{"key":"e_1_2_1_32_1","volume-title":"The landscape of GPGPU performance modeling tools. Parallel Comput. 56 (Aug","author":"Madougou Souley","year":"2016","unstructured":"Souley Madougou , Ana Varbanescu , Cees de Laat , and Rob van Nieuwpoort . 2016. The landscape of GPGPU performance modeling tools. Parallel Comput. 56 (Aug . 2016 ), 18--33. DOI:https:\/\/doi.org\/10.1016\/j.parco.2016.04.002 10.1016\/j.parco.2016.04.002 Souley Madougou, Ana Varbanescu, Cees de Laat, and Rob van Nieuwpoort. 2016. The landscape of GPGPU performance modeling tools. Parallel Comput. 56 (Aug. 2016), 18--33. DOI:https:\/\/doi.org\/10.1016\/j.parco.2016.04.002"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917)","author":"Majumdar A.","year":"2017","unstructured":"A. Majumdar , L. Piga , I. Paul , J. L. Greathouse , W. Huang , and D. H. Albonesi . 2017. Dynamic GPGPU power management using adaptive model predictive control . In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917) . IEEE, 613--624. DOI:https:\/\/doi.org\/10.1109\/HPCA. 2017 .34 10.1109\/HPCA.2017.34 A. Majumdar, L. Piga, I. Paul, J. L. Greathouse, W. Huang, and D. H. Albonesi. 2017. Dynamic GPGPU power management using adaptive model predictive control. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201917). IEEE, 613--624. DOI:https:\/\/doi.org\/10.1109\/HPCA.2017.34"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/GREENCOMP.2010.5598315"},{"key":"e_1_2_1_35_1","unstructured":"NVIDIA. 2012. NVIDIA system management interface. Retrieved from https:\/\/developer.nvidia.com\/nvidia-system-management-interface.  NVIDIA. 2012. NVIDIA system management interface. Retrieved from https:\/\/developer.nvidia.com\/nvidia-system-management-interface."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522716"},{"key":"e_1_2_1_37_1","volume-title":"Hennessy","author":"Patterson David A.","year":"2012","unstructured":"David A. Patterson and John L . Hennessy . 2012 . Computer Organization and Design: The Hardware\/Software Interface. ( 4 th ed.). Elsevier Morgan Kaufmann , Amsterdam. David A. Patterson and John L. Hennessy. 2012. Computer Organization and Design: The Hardware\/Software Interface. (4th ed.). Elsevier Morgan Kaufmann, Amsterdam.","edition":"4"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-64203-1_8"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-20656-7_3"},{"key":"e_1_2_1_41_1","unstructured":"Mark R. Segal. 2004. Machine learning benchmarks and random forest regression. UCSF: Center for Bioinformatics and Molecular Biostatistics. Retrieved from https:\/\/escholarship.org\/uc\/item\/35x3v9t4.  Mark R. Segal. 2004. Machine learning benchmarks and random forest regression. UCSF: Center for Bioinformatics and Molecular Biostatistics. Retrieved from https:\/\/escholarship.org\/uc\/item\/35x3v9t4."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS\u201913)","author":"Song Shuaiwen","year":"2013","unstructured":"Shuaiwen Song , Chunyi Su , Barry Rountree , and Kirk W. Cameron . 2013. A simplified and accurate model of power-performance efficiency on emergent GPU architectures . In Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS\u201913) . IEEE, 673--686. DOI:https:\/\/doi.org\/10.1109\/IPDPS. 2013 .73 10.1109\/IPDPS.2013.73 Shuaiwen Song, Chunyi Su, Barry Rountree, and Kirk W. Cameron. 2013. A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS\u201913). IEEE, 673--686. DOI:https:\/\/doi.org\/10.1109\/IPDPS.2013.73"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201912)","author":"Spafford K. L.","year":"2012","unstructured":"K. L. Spafford and J. S. Vetter . 2012. Aspen: A domain specific language for performance modeling . In Proceedings of the IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201912) . IEEE, 1--11. DOI:https:\/\/doi.org\/10.1109\/SC. 2012 .20 10.1109\/SC.2012.20 K. L. Spafford and J. S. Vetter. 2012. Aspen: A domain specific language for performance modeling. In Proceedings of the IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC\u201912). IEEE, 1--11. DOI:https:\/\/doi.org\/10.1109\/SC.2012.20"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the 42nd International Symposium on Computer Architecture (ISCA\u201915)","author":"Stephenson Mark","unstructured":"Mark Stephenson , Siva Kumar Sastry Hari , Yunsup Lee , Eiman Ebrahimi , Daniel R. Johnson , David Nellans , Mike O\u2019Connor , and Stephen W. Keckler . 2015. Flexible software profiling of GPU architectures . In Proceedings of the 42nd International Symposium on Computer Architecture (ISCA\u201915) . ACM, Portland, OR, 185--197. DOI:https:\/\/doi.org\/10.1145\/2749469.2750375 10.1145\/2749469.2750375 Mark Stephenson, Siva Kumar Sastry Hari, Yunsup Lee, Eiman Ebrahimi, Daniel R. Johnson, David Nellans, Mike O\u2019Connor, and Stephen W. Keckler. 2015. Flexible software profiling of GPU architectures. In Proceedings of the 42nd International Symposium on Computer Architecture (ISCA\u201915). ACM, Portland, OR, 185--197. DOI:https:\/\/doi.org\/10.1145\/2749469.2750375"},{"key":"e_1_2_1_45_1","volume-title":"Geng Daniel Liu, and Wen-mei W. Hwu","author":"Stratton John A.","year":"2012","unstructured":"John A. Stratton , Christopher Rodrigues , I- Jui Sung , Nady Obeid , Li-Wen Chang , Nasser Anssari , Geng Daniel Liu, and Wen-mei W. Hwu . 2012 . Parboil : A revised benchmark suite for scientific and commercial throughput computing. IMPACT Technical Report, IMPACT-12-01, University of Illinois , at Urbana-Champaign (Mar. 2012), 12. John A. Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W. Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. IMPACT Technical Report, IMPACT-12-01, University of Illinois, at Urbana-Champaign (Mar. 2012), 12."},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the IEEE 37th International Conference on Distributed Computing Systems (ICDCS\u201917)","author":"Thinakaran P.","year":"2017","unstructured":"P. Thinakaran , J. R. Gunasekaran , B. Sharma , M. T. Kandemir , and C. R. Das . 2017. Phoenix: A constraint-aware scheduler for heterogeneous datacenters . In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems (ICDCS\u201917) . IEEE, 977--987. DOI:https:\/\/doi.org\/10.1109\/ICDCS. 2017 .262 10.1109\/ICDCS.2017.262 P. Thinakaran, J. R. Gunasekaran, B. Sharma, M. T. Kandemir, and C. R. Das. 2017. Phoenix: A constraint-aware scheduler for heterogeneous datacenters. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems (ICDCS\u201917). IEEE, 977--987. DOI:https:\/\/doi.org\/10.1109\/ICDCS.2017.262"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1214\/08-AOAS224"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/79173.79181"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS\u201918)","author":"Wang Q.","year":"2018","unstructured":"Q. Wang and X. Chu . 2018. GPGPU performance estimation with core and memory frequency scaling . In Proceedings of the IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS\u201918) . IEEE, 417--424. DOI:https:\/\/doi.org\/10.1109\/PADSW. 2018 .8645000 10.1109\/PADSW.2018.8645000 Q. Wang and X. Chu. 2018. GPGPU performance estimation with core and memory frequency scaling. In Proceedings of the IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS\u201918). IEEE, 417--424. DOI:https:\/\/doi.org\/10.1109\/PADSW.2018.8645000"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919)","author":"Wang X.","year":"2019","unstructured":"X. Wang , K. Huang , A. Knoll , and X. Qian . 2019. A hybrid framework for fast and accurate GPU performance estimation through source-level analysis and trace-based simulation . In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919) . 506--518. DOI:https:\/\/doi.org\/10.1109\/HPCA. 2019 .00062 10.1109\/HPCA.2019.00062 X. Wang, K. Huang, A. Knoll, and X. Qian. 2019. A hybrid framework for fast and accurate GPU performance estimation through source-level analysis and trace-based simulation. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA\u201919). 506--518. DOI:https:\/\/doi.org\/10.1109\/HPCA.2019.00062"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056063"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture (HPCA\u201911)","author":"Zhang Y.","year":"2011","unstructured":"Y. Zhang and J. D. Owens . 2011. A quantitative performance analysis model for GPU architectures . In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture (HPCA\u201911) . IEEE, 382--393. DOI:https:\/\/doi.org\/10.1109\/HPCA. 2011 .5749745 10.1109\/HPCA.2011.5749745 Y. Zhang and J. D. Owens. 2011. A quantitative performance analysis model for GPU architectures. In Proceedings of the IEEE 17th International Symposium on High Performance Computer Architecture (HPCA\u201911). IEEE, 382--393. DOI:https:\/\/doi.org\/10.1109\/HPCA.2011.5749745"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3431731","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3431731","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:24:46Z","timestamp":1750195486000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3431731"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,30]]},"references-count":51,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,3,31]]}},"alternative-id":["10.1145\/3431731"],"URL":"https:\/\/doi.org\/10.1145\/3431731","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,30]]},"assertion":[{"value":"2019-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-12-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}