{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:58:53Z","timestamp":1750309133372,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":44,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,5,7]],"date-time":"2024-05-07T00:00:00Z","timestamp":1715040000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"name":"Spanish Science and Technology Commission","award":["PID2019-105660RB-C22, PID2022- 136454NB-C21 and TED2021-131176B-I00"],"award-info":[{"award-number":["PID2019-105660RB-C22, PID2022- 136454NB-C21 and TED2021-131176B-I00"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,5,7]]},"DOI":"10.1145\/3649153.3649208","type":"proceedings-article","created":{"date-parts":[[2024,7,2]],"date-time":"2024-07-02T10:21:29Z","timestamp":1719915689000},"page":"106-114","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Hardware support for balanced co-execution in heterogeneous processors"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3695-2906","authenticated-orcid":false,"given":"Borja","family":"Perez","sequence":"first","affiliation":[{"name":"Universidad de Cantabria, Spain, Santander"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7718-8449","authenticated-orcid":false,"given":"Jose Luis","family":"Bosque","sequence":"additional","affiliation":[{"name":"Universidad de Cantabria, Spain, Santander"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,7,2]]},"reference":[{"volume-title":"Dynamic load balancing on heterogeneous multicore\/multiGPU systems","author":"Acosta Alejandro","key":"e_1_3_2_1_1_1","unstructured":"Alejandro Acosta, Robert Corujo, Vicente Blanco, and Francisco Almeida. 2010. Dynamic load balancing on heterogeneous multicore\/multiGPU systems.. In HPCS, Waleed W. Smari and John P. McIntire (Eds.). IEEE, 467--476."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"N. Agarwal D. Nellans E. Ebrahimi T. F. Wenisch J. Danskin and S. W. Keckler. 2016. Selective GPU caches to eliminate CPU-GPU HW cache coherence. In IEEE Int. Sym. on High Performance Computer Architecture (HPCA). 494--506.","DOI":"10.1109\/HPCA.2016.7446089"},{"key":"e_1_3_2_1_3_1","unstructured":"AMD. 2023. AMD INSTINCT\u2122 MI300A APU. Integrated CPU\/GPU accelerated processing unit for high-performance computing generative AI and ML training. https:\/\/www.amd.com\/content\/dam\/amd\/en\/documents\/instinct-tech-docs\/data- sheets\/amd-instinct-mi300a-data-sheet.pdf"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1631"},{"key":"e_1_3_2_1_5_1","volume-title":"Understanding the Role of GPGPU-Accelerated SoC-Based ARM Clusters. In 2017 IEEE Int. Conf. Cluster Computing. 333--343","author":"Azimi R.","year":"2017","unstructured":"R. Azimi, T. Fox, andS. Reda. 2017. Understanding the Role of GPGPU-Accelerated SoC-Based ARM Clusters. In 2017 IEEE Int. Conf. Cluster Computing. 333--343."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400716"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2616314"},{"volume-title":"Proc. ACM Int. Conf. on Computing Frontiers","author":"Boyer M.","key":"e_1_3_2_1_8_1","unstructured":"M. Boyer, K. Skadron, S. Che, and N. Jayasena. 2013. Load Balancing in a Changing World: Dealing with Heterogeneity and Performance Variability. In Proc. ACM Int. Conf. on Computing Frontiers (Ischia, Italy). ACM, Article 21, 10 pages."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2014.2316825"},{"volume-title":"Architectural Support for Task Dependence Management with Flexible Software Scheduling. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 283--295","author":"Castillo E.","key":"e_1_3_2_1_10_1","unstructured":"E. Castillo, L. Alvarez, M. Moreto, M. Casas, E. Vallejo, J. L. Bosque, R. Beivide, and M. Valero. 2018. Architectural Support for Task Dependence Management with Flexible Software Scheduling. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 283--295."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-014-1316-5"},{"volume-title":"CATA: Criticality Aware Task Acceleration for Multicore Processors. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 413--422","author":"Castillo E.","key":"e_1_3_2_1_12_1","unstructured":"E. Castillo, M. Moreto, M. Casas, L. Alvarez, E. Vallejo, K. Chronaki, R. Badia, J. L. Bosque, R. Beivide, E. Ayguade, J. Labarta, and M. Valero. 2016. CATA: Criticality Aware Task Acceleration for Multicore Processors. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 413--422."},{"volume-title":"2016 IEEE International Symposium on Workload Characterization (IISWC). 1--10","author":"Garc\u00eda V.","key":"e_1_3_2_1_13_1","unstructured":"V. Garc\u00eda, J. Gomez-Luna, T. Grass, A. Rico, E. Ayguade, and A. J. Pena. 2016. Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). 1--10."},{"volume-title":"Proc. of IPDPS. 1299--1308","author":"Gautier T.","key":"e_1_3_2_1_14_1","unstructured":"T. Gautier, J.V.F. Lima, N. Maillard, and B. Raffin. 2013. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures. In Proc. of IPDPS. 1299--1308."},{"volume-title":"2014 IEEE International Symposium on Workload Characterization (IISWC). 150--160","author":"Hestness J.","key":"e_1_3_2_1_15_1","unstructured":"J. Hestness, S. W. Keckler, and D. A. Wood. 2014. A comparative analysis of microarchitecture effects on CPU and GPU memory system behavior. In 2014 IEEE International Symposium on Workload Characterization (IISWC). 150--160."},{"volume-title":"GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors. In 2015 IEEE International Symposium on Workload Characterization. 87--97","author":"Hestness J.","key":"e_1_3_2_1_16_1","unstructured":"J. Hestness, S. W. Keckler, and D. A. Wood. 2015. GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors. In 2015 IEEE International Symposium on Workload Characterization. 87--97."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"D. R. Kaeli P. Mistry D. Schaa and D. P. Zhang. 2015. Heterogeneous Computing with OpenCL 2.0 (1st ed.). Morgan Kaufmann Publishers Inc.","DOI":"10.1016\/B978-0-12-801414-1.00001-6"},{"key":"e_1_3_2_1_18_1","volume-title":"l","author":"Kaleem R.","year":"2014","unstructured":"R. Kaleem and et all. 2014. Adaptive Heterogeneous Scheduling for Integrated GPUs. In Proc. of PACT. 151--162."},{"volume-title":"Proc. of the ACM PPoPP. ACM, 277--287","author":"Kim J.","key":"e_1_3_2_1_19_1","unstructured":"J. Kim, H. Kim, J.H. Lee, and J. Lee. 2011. Achieving a Single Compute Device Image in OpenCL for Multiple GPUs. In Proc. of the ACM PPoPP. ACM, 277--287."},{"volume-title":"Proc. of the ACM ICS (Italy). 341--352","author":"Kim J.","key":"e_1_3_2_1_20_1","unstructured":"J. Kim, S. Seo, J. Lee, J. Nah, G. Jo, and J. Lee. 2012. SnuCL: An OpenCL Framework for Heterogeneous CPU\/GPU Clusters. In Proc. of the ACM ICS (Italy). 341--352."},{"key":"e_1_3_2_1_21_1","unstructured":"D. B. Kirk and W. W. Hwu. 2010. Programming Massively Parallel Processors: A Hands-on Approach (1st ed.). Morgan Kaufmann."},{"volume-title":"Proc. of PACT","author":"Lee J.","key":"e_1_3_2_1_22_1","unstructured":"J. Lee, M. Samadi, Y. Park, and S. Mahlke. 2013. Transparent CPU-GPU Collaboration for Data-parallel Kernels on Heterogeneous Systems. In Proc. of PACT (Scotland, UK). IEEE Press, 245--256."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2798725"},{"volume-title":"Proc. of the 42Nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO 42)","author":"Luk C.","key":"e_1_3_2_1_24_1","unstructured":"C. Luk, S. Hong, and H. Kim. 2009. Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. In Proc. of the 42Nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO 42). ACM, 45--55."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-014-1200-3"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-85665-6_31"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.3390\/electronics10192386"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2020.02.016"},{"key":"e_1_3_2_1_29_1","unstructured":"NVIDIA. 2023. NVIDIA GH200 Grace Hopper Superchip. The breakthrough accelerated CPU for large-scale AI and high-performance computing (HPC) applications. https:\/\/resources.nvidia.com\/en-us-grace-cpu\/grace-hopper-superchip"},{"key":"e_1_3_2_1_30_1","volume-title":"Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices. In Proc","author":"Pandit P.","year":"2014","unstructured":"P. Pandit and R. Govindarajan. 2014. Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices. In Proc. Annual IEEE\/ACM CGO. ACM, Article 273, 11 pages."},{"volume-title":"Simplifying Programming and Load Balancing of Data Parallel Applications on Heterogeneous Systems. In 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit (GPGPU '16)","author":"P\u00e9rez B.","key":"e_1_3_2_1_31_1","unstructured":"B. P\u00e9rez, J. L. Bosque, and R. Beivide. 2016. Simplifying Programming and Load Balancing of Data Parallel Applications on Heterogeneous Systems. In 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit (GPGPU '16). ACM, 42--51."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2021.06.003"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"J. Power A. Basu J. Gu S. Puthoor B. M. Beckmann M. D. Hill S. K. Reinhardt and D. A. Wood. 2013. Heterogeneous System Coherence for Integrated CPU-GPU Systems. In 46th IEEE\/ACM Int. Sym. on Microarchitecture (MICRO-46). ACM 457--467.","DOI":"10.1145\/2540708.2540747"},{"volume-title":"Maestro: Data Orchestration and Tuning for OpenCL Devices. In 16th International Euro-Par Conference on Parallel Processing: Part II","author":"Spafford K.","key":"e_1_3_2_1_34_1","unstructured":"K. Spafford, J. Meredith, and J. Vetter. 2010. Maestro: Data Orchestration and Tuning for OpenCL Devices. In 16th International Euro-Par Conference on Parallel Processing: Part II (Ischia, Italy) (Euro-Par' 10). Springer-Verlag, 275--286."},{"volume-title":"Conf. on Parallel and Distributed Computing. 710--722","author":"Stafford Esteban","key":"e_1_3_2_1_35_1","unstructured":"Esteban Stafford, B. P\u00e9rez, J. L. Bosque, R. Beivide, and M. Valero. 2017. To Distribute or Not to Distribute: The Question of Load Balancing for Performance or Energy. In Euro-Par 2017: Parallel Processing - 23rd Int. Conf. on Parallel and Distributed Computing. 710--722."},{"volume-title":"2015 IEEE Int. Conference on Cluster Computing. 533--534","author":"Ukidave Y.","key":"e_1_3_2_1_36_1","unstructured":"Y. Ukidave, D. Kaeli, U. Gupta, and K. Keville. 2015. Performance of the NVIDIA Jetson TK1 in HPC. In 2015 IEEE Int. Conference on Cluster Computing. 533--534."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"T. Vijayaraghavan Y. Eckert G. H. Loh M. J. Schulte M. Ignatowski B. M. Beckmann W. C. Brantley J. L. Greathouse W. Huang A. Karunanithi O. Kayiran M. Meswani I. Paul M. Poremba S. Raasch S. K. Reinhardt G. Sadowski and V. Sridharan. 2017. Design and Analysis of an APU for Exascale Computing. In 2017 IEEE Int. Sym. on High Performance Computer Architecture (HPCA). 85--96.","DOI":"10.1109\/HPCA.2017.42"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2015.05.213"},{"key":"e_1_3_2_1_39_1","volume-title":"Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture. HPEC","author":"Wen Hao","year":"2019","unstructured":"Hao Wen and Wei Zhang. 2019. Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture. HPEC (2019), 1--6."},{"volume-title":"2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers. 1--3.","author":"Wilcox K.","key":"e_1_3_2_1_40_1","unstructured":"K. Wilcox, D. Akeson, H. R. Fair, J. Farrell, D. Johnson, G. Krishnan, H. Mclntyre, E. McLellan, S. Naffziger, R. Schreiber, S. Sundaram, and J. White. 2015. 4.8 A 28nm x86 APU optimized for power and area efficiency. In 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers. 1--3."},{"volume-title":"IEEE International Symposium on High-Performance Comp Architecture. 1--12","author":"Yang Y.","key":"e_1_3_2_1_41_1","unstructured":"Y. Yang, P. Xiang, M. Mantor, and H. Zhou. 2012. CPU-assisted GPGPU on fused CPU-GPU architectures. In IEEE International Symposium on High-Performance Comp Architecture. 1--12."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688505"},{"key":"e_1_3_2_1_43_1","first-page":"9","article-title":"Data Partitioning on Multi-core and Multi-GPU Platforms Using Functional Performance Models. Computers","volume":"64","author":"Zhong Ziming","year":"2015","unstructured":"Ziming Zhong, V. Rychkov, and A. Lastovetsky. 2015. Data Partitioning on Multi-core and Multi-GPU Platforms Using Functional Performance Models. Computers, IEEE Transactions on 64, 9 (Sept 2015), 2506--2518.","journal-title":"IEEE Transactions on"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786572.2786596"}],"event":{"name":"CF '24: 21st ACM International Conference on Computing Frontiers","sponsor":["SIGMICRO ACM Special Interest Group on Microarchitectural Research and Processing"],"location":"Ischia Italy","acronym":"CF '24"},"container-title":["Proceedings of the 21st ACM International Conference on Computing Frontiers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3649153.3649208","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3649153.3649208","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:50:02Z","timestamp":1750287002000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3649153.3649208"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,7]]},"references-count":44,"alternative-id":["10.1145\/3649153.3649208","10.1145\/3649153"],"URL":"https:\/\/doi.org\/10.1145\/3649153.3649208","relation":{},"subject":[],"published":{"date-parts":[[2024,5,7]]},"assertion":[{"value":"2024-07-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}