{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,21]],"date-time":"2025-12-21T10:03:50Z","timestamp":1766311430599,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,5,25]],"date-time":"2020-05-25T00:00:00Z","timestamp":1590364800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100010669","name":"H2020 LEIT Information and Communication Technologies","doi-asserted-by":"publisher","award":["871669"],"award-info":[{"award-number":["871669"]}],"id":[{"id":"10.13039\/100010669","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,5,25]]},"DOI":"10.1145\/3378678.3391881","type":"proceedings-article","created":{"date-parts":[[2020,5,26]],"date-time":"2020-05-26T00:21:35Z","timestamp":1590452495000},"page":"42-47","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["OpenMP to CUDA graphs"],"prefix":"10.1145","author":[{"given":"Chenle","family":"Yu","sequence":"first","affiliation":[{"name":"Barcelona Supercomputing Center"}]},{"given":"Sara","family":"Royuela","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center"}]},{"given":"Eduardo","family":"Qui\u00f1ones","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center"}]}],"member":"320","published-online":{"date-parts":[[2020,5,25]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Muthu Manikandan Baskaran Jj Ramanujam and P Sadayappan. 2010. Automatic C-to-CUDA Code Generation for Affine Programs. In CC. 244--263.  Muthu Manikandan Baskaran Jj Ramanujam and P Sadayappan. 2010. Automatic C-to-CUDA Code Generation for Affine Programs. In CC. 244--263.","DOI":"10.1007\/978-3-642-11970-5_14"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Carlo Bertolli Samuel F Antao Alexandre E Eichenberger Kevin OBrien Zehra Sura Arpith C Jacob Tong Chen and Olivier Sallenave. 2014. Coordinating GPU threads for OpenMP 4.0 in LLVM. In LLVM Compiler Infrastructure in HPC.  Carlo Bertolli Samuel F Antao Alexandre E Eichenberger Kevin OBrien Zehra Sura Arpith C Jacob Tong Chen and Olivier Sallenave. 2014. Coordinating GPU threads for OpenMP 4.0 in LLVM. In LLVM Compiler Infrastructure in HPC.","DOI":"10.1109\/LLVM-HPC.2014.10"},{"key":"e_1_3_2_1_3_1","unstructured":"BSC. 2020. Mercurium. https:\/\/pm.bsc.es\/mcxx  BSC. 2020. Mercurium. https:\/\/pm.bsc.es\/mcxx"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Barbara Chapman Lei Huang Eric Biscondi Eric Stotzer Ashish Shrivastava and Alan Gatherer. 2009. Implementing OpenMP on a High Performance Embedded Multicore MPSoC. In IPDPS.  Barbara Chapman Lei Huang Eric Biscondi Eric Stotzer Ashish Shrivastava and Alan Gatherer. 2009. Implementing OpenMP on a High Performance Embedded Multicore MPSoC. In IPDPS.","DOI":"10.1109\/IPDPS.2009.5161107"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Kallia Chronaki Alejandro Rico Rosa M Badia Eduard Ayguad\u00e9 Jes\u00fas Labarta and Mateo Valero. 2015. Criticality-aware Dynamic Task Scheduling for Heterogeneous Architectures. In ICS.  Kallia Chronaki Alejandro Rico Rosa M Badia Eduard Ayguad\u00e9 Jes\u00fas Labarta and Mateo Valero. 2015. Criticality-aware Dynamic Task Scheduling for Heterogeneous Architectures. In ICS.","DOI":"10.1145\/2751205.2751235"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626411000151"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Jianbin Fang Ana Lucia Varbanescu and Henk Sips. 2011. A comprehensive Performance Comparison of CUDA and OpenCL. In ICPP. 216--225.  Jianbin Fang Ana Lucia Varbanescu and Henk Sips. 2011. A comprehensive Performance Comparison of CUDA and OpenCL. In ICPP. 216--225.","DOI":"10.1109\/ICPP.2011.45"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Tetsuya Hoshino Naoya Maruyama Satoshi Matsuoka and Ryoji Takaki. 2013. CUDA vs OpenACC: Performance case studies with kernel benchmarks and a memory-bound CFD application. In CCGRID. IEEE 136--143.  Tetsuya Hoshino Naoya Maruyama Satoshi Matsuoka and Ryoji Takaki. 2013. CUDA vs OpenACC: Performance case studies with kernel benchmarks and a memory-bound CFD application. In CCGRID. IEEE 136--143.","DOI":"10.1109\/CCGrid.2013.12"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"G\u00e9raud Krawezik. 2003. Performance Comparison of MPI and Three OpenMP Programming Styles on Shared Memory Multiprocessors. In SPAA. 118--127.  G\u00e9raud Krawezik. 2003. Performance Comparison of MPI and Three OpenMP Programming Styles on Shared Memory Multiprocessors. In SPAA. 118--127.","DOI":"10.1145\/777412.777433"},{"volume-title":"Cray User Group Meeting","year":"2016","author":"Larrea V Vergara","key":"e_1_3_2_1_10_1"},{"key":"e_1_3_2_1_11_1","unstructured":"Seyong Lee and Rudolf Eigenmann. 2010. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In SC. 1--11.  Seyong Lee and Rudolf Eigenmann. 2010. OpenMPC: Extended OpenMP Programming and Tuning for GPUs. In SC. 1--11."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Seyong Lee Seung-Jai Min and Rudolf Eigenmann. 2009. OpenMP to GPGPU: a Compiler Framework for Automatic Translation and Optimization. ACM Sigplan Notices 44 4 (2009).  Seyong Lee Seung-Jai Min and Rudolf Eigenmann. 2009. OpenMP to GPGPU: a Compiler Framework for Automatic Translation and Optimization. ACM Sigplan Notices 44 4 (2009).","DOI":"10.1145\/1594835.1504194"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Andrea Marongiu Paolo Burgio and Luca Benini. 2011. Supporting OpenMP on a Multi-cluster Embedded MPSoC. Microprocessors and Microsystems 35 8 (2011).  Andrea Marongiu Paolo Burgio and Luca Benini. 2011. Supporting OpenMP on a Multi-cluster Embedded MPSoC. Microprocessors and Microsystems 35 8 (2011).","DOI":"10.1016\/j.micpro.2011.08.010"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Matt Martineau Simon McIntosh-Smith and Wayne Gaudin. 2016. Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model. In IPDPSW. 338--347.  Matt Martineau Simon McIntosh-Smith and Wayne Gaudin. 2016. Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model. In IPDPSW. 338--347.","DOI":"10.1109\/IPDPSW.2016.70"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3110355.3110356"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Adri\u00e1n Munera S\u00e1nchez Sara Royuela and Eduardo Qui\u00f1ones. 2020. Towards a Qualifiable OpenMP Framework for Embedded Systems. In DATE.  Adri\u00e1n Munera S\u00e1nchez Sara Royuela and Eduardo Qui\u00f1ones. 2020. Towards a Qualifiable OpenMP Framework for Embedded Systems. In DATE.","DOI":"10.23919\/DATE48585.2020.9116230"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Gabriel Noaje Christophe Jaillet and Micha\u00ebl Krajecki. 2011. Source-to-source Code Translator: OpenMP C to CUDA. In HPCC. 512--519.  Gabriel Noaje Christophe Jaillet and Micha\u00ebl Krajecki. 2011. Source-to-source Code Translator: OpenMP C to CUDA. In HPCC. 512--519.","DOI":"10.1109\/HPCC.2011.73"},{"key":"e_1_3_2_1_18_1","unstructured":"NVIDIA. 2019. Profiler's user guide. https:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/  NVIDIA. 2019. Profiler's user guide. https:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/"},{"key":"e_1_3_2_1_19_1","unstructured":"NVIDIA. 2019. Programming Guide :: CUDA Toolkit Documentation. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/  NVIDIA. 2019. Programming Guide :: CUDA Toolkit Documentation. https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/"},{"key":"e_1_3_2_1_20_1","unstructured":"OpenMP ARB. 2018. OpenMP Application Program Interface version 5.0. https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP-API-Specification-5.0.pdf  OpenMP ARB. 2018. OpenMP Application Program Interface version 5.0. https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP-API-Specification-5.0.pdf"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-012-0853-z"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Sara Royuela Alejandro Duran Maria A Serrano Eduardo Qui\u00f1ones and Xavier Martorell. 2017. A Functional Safety OpenMP* for Critical Real-Time Embedded Systems. In IWOMP.  Sara Royuela Alejandro Duran Maria A Serrano Eduardo Qui\u00f1ones and Xavier Martorell. 2017. A Functional Safety OpenMP * for Critical Real-Time Embedded Systems. In IWOMP.","DOI":"10.1007\/978-3-319-65578-9_16"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Sara Royuela Roger Ferrer Diego Caballero and Xavier Martorell. 2015. Compiler analysis for OpenMP tasks correctness. In CF. 1--8.  Sara Royuela Roger Ferrer Diego Caballero and Xavier Martorell. 2015. Compiler analysis for OpenMP tasks correctness. In CF. 1--8.","DOI":"10.1145\/2742854.2742882"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Florentino Sainz Sergi Mateo Vicen\u00e7 Beltran Jose L Bosque Xavier Martorell and Eduard Ayguad\u00e9. 2014. Leveraging OmpSs to Exploit Hardware Accelerators. In SBAC-PAD. 112--119.  Florentino Sainz Sergi Mateo Vicen\u00e7 Beltran Jose L Bosque Xavier Martorell and Eduard Ayguad\u00e9. 2014. Leveraging OmpSs to Exploit Hardware Accelerators. In SBAC-PAD. 112--119.","DOI":"10.1109\/SBAC-PAD.2014.26"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Maria A Serrano Alessandra Melani Roberto Vargas Andrea Marongiu Marko Bertogna and Eduardo Qui\u00f1ones. 2015. Timing Characterization of OpenMP4 Tasking Model. In CASES. 157--166.  Maria A Serrano Alessandra Melani Roberto Vargas Andrea Marongiu Marko Bertogna and Eduardo Qui\u00f1ones. 2015. Timing Characterization of OpenMP4 Tasking Model. In CASES. 157--166.","DOI":"10.1109\/CASES.2015.7324556"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Maria A Serrano Sara Royuela and Eduardo Qui\u00f1ones. 2018. Towards an OpenMP Specification for Critical Real-time Systems. In IWOMP. 143--159.  Maria A Serrano Sara Royuela and Eduardo Qui\u00f1ones. 2018. Towards an OpenMP Specification for Critical Real-time Systems. In IWOMP. 143--159.","DOI":"10.1007\/978-3-319-98521-3_10"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Jie Shen Jianbin Fang Henk Sips and Ana Lucia Varbanescu. 2012. Performance Gaps between OpenMP and OpenCL for Multi-core CPUs. In ICPP. 116--125.  Jie Shen Jianbin Fang Henk Sips and Ana Lucia Varbanescu. 2012. Performance Gaps between OpenMP and OpenCL for Multi-core CPUs. In ICPP. 116--125.","DOI":"10.1109\/ICPPW.2012.18"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"John E Stone David Gohara and Guochun Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. Computing in Science & Engineering 12 3 (2010).  John E Stone David Gohara and Guochun Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. Computing in Science & Engineering 12 3 (2010).","DOI":"10.1109\/MCSE.2010.69"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Vargas Roberto E and Royuela Sara and Serrano Maria A and Martorell Xavi and Qui\u00f1ones Eduardo. 2016. A lightweight OpenMP4 Run-time for Embedded Systems. In ASP-DAC.  Vargas Roberto E and Royuela Sara and Serrano Maria A and Martorell Xavi and Qui\u00f1ones Eduardo. 2016. A lightweight OpenMP4 Run-time for Embedded Systems. In ASP-DAC.","DOI":"10.1109\/ASPDAC.2016.7427987"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Sandra Wienke Paul Springer Christian Terboven and Dieter an Mey. 2012. OpenACC - First Experiences with Real-world Applications. In Euro-Par.  Sandra Wienke Paul Springer Christian Terboven and Dieter an Mey. 2012. OpenACC - First Experiences with Real-world Applications. In Euro-Par.","DOI":"10.1007\/978-3-642-32820-6_85"}],"event":{"name":"SCOPES '20: 23rd International Workshop on Software and Compilers for Embedded Systems","sponsor":["SIGBED ACM Special Interest Group on Embedded Systems","EDAA European Design Automation Association"],"location":"St. Goar Germany","acronym":"SCOPES '20"},"container-title":["Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3378678.3391881","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3378678.3391881","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:19Z","timestamp":1750200079000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3378678.3391881"}},"subtitle":["a compiler-based transformation to enhance the programmability of NVIDIA devices"],"short-title":[],"issued":{"date-parts":[[2020,5,25]]},"references-count":30,"alternative-id":["10.1145\/3378678.3391881","10.1145\/3378678"],"URL":"https:\/\/doi.org\/10.1145\/3378678.3391881","relation":{},"subject":[],"published":{"date-parts":[[2020,5,25]]},"assertion":[{"value":"2020-05-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}