{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,25]],"date-time":"2025-03-25T18:28:31Z","timestamp":1742927311749,"version":"3.40.3"},"publisher-location":"Cham","reference-count":26,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031725661"},{"type":"electronic","value":"9783031725678"}],"license":[{"start":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T00:00:00Z","timestamp":1704067200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,9,16]],"date-time":"2024-09-16T00:00:00Z","timestamp":1726444800000},"content-version":"vor","delay-in-days":259,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The throughput-centric design of GPUs poses challenges when integrating them into time-sensitive applications. Nevertheless, modern GPU architectures and software have recently evolved, making it possible to minimize overheads and interference along the critical path through advanced mechanisms, such as GPU graphs, while sustaining high throughput. However, GPU vendors provide programming ecosystems specific to their products, raising concerns about code portability. Hence, there is a need for a hardware-agnostic API capable of managing time-sensitive GPU-accelerated pipelines. In this context, we propose integrating event-based synchronizations into the high-level OpenMP programming model to, in combination with GPU graphs, notably reduce interference and overheads over the critical path. This work showcases how this combination offers significant performance improvements and time consistency. We also enable portability across several vendor ecosystems and demonstrate our work on a set of representative applications for cyber-physical systems. According to our experiments, we measured a maximum jitter below 20\u00a0<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\upmu $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>\u03bc<\/mml:mi>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>s, representing less than 5% of time variation.<\/jats:p>","DOI":"10.1007\/978-3-031-72567-8_3","type":"book-chapter","created":{"date-parts":[[2024,9,19]],"date-time":"2024-09-19T16:19:25Z","timestamp":1726762765000},"page":"31-45","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Event-Based OpenMP Tasks for\u00a0Time-Sensitive GPU-Accelerated Systems"],"prefix":"10.1007","author":[{"given":"Cyril","family":"Cetre","sequence":"first","affiliation":[]},{"given":"Chenle","family":"Yu","sequence":"additional","affiliation":[]},{"given":"Sara","family":"Royuela","sequence":"additional","affiliation":[]},{"given":"R\u00e9mi","family":"Barrere","sequence":"additional","affiliation":[]},{"given":"Eduardo","family":"Qui\u00f1ones","sequence":"additional","affiliation":[]},{"given":"Damien","family":"Gratadour","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,9,16]]},"reference":[{"key":"3_CR1","unstructured":"AMD: HIP documentation (2024). https:\/\/rocm.docs.amd.com\/projects\/HIP\/en\/latest\/"},{"key":"3_CR2","doi-asserted-by":"publisher","unstructured":"Amert, T., Otterness, N., Yang, M., Anderson, J.H., Smith, F.D.: GPU scheduling on the NVIDIA TX2: hidden details revealed. In: 2017 IEEE Real-Time Systems Symposium (RTSS), pp. 104\u2013115 (2017). https:\/\/doi.org\/10.1109\/RTSS.2017.00017","DOI":"10.1109\/RTSS.2017.00017"},{"key":"3_CR3","doi-asserted-by":"crossref","unstructured":"Barrere, R., Lenormand, E., Bui, D., Lee, E.A., Shaver, C., Tripakis, S.: An introduction to the pthales domain of Ptolemy II. Technical report UCB\/EECS-2011-32, EECS Department, University of California, Berkeley (2011). http:\/\/www2.eecs.berkeley.edu\/Pubs\/TechRpts\/2011\/EECS-2011-32.html","DOI":"10.21236\/ADA543228"},{"key":"3_CR4","doi-asserted-by":"publisher","unstructured":"Capodieci, N., Cavicchioli, R., Olmedo, I.S., Solieri, M., Bertogna, M.: Contending memory in heterogeneous SoCs: Evolution in NVIDIA tegra embedded platforms. In: 2020 IEEE 26th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pp. 1\u201310 (2020). https:\/\/doi.org\/10.1109\/RTCSA50079.2020.9203722","DOI":"10.1109\/RTCSA50079.2020.9203722"},{"key":"3_CR5","unstructured":"Cetre, C., Ferreira, F., Sevin, A., Barrere, R., Gratadour, D.: Real-time high performance computing using a Jetson Xavier AGX. In: 11th European Congress Embedded Real Time System ( ERTS2022 ). Toulouse, France (2022). https:\/\/hal.science\/hal-03693764"},{"key":"3_CR6","doi-asserted-by":"publisher","unstructured":"Chen, J., Chen, J., Min, H., Wang, X.: Real-time embedded implementation of adaptive beamforming for medical ultrasound imaging. In: 2016 Sixth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC), pp. 356\u2013360 (2016). https:\/\/doi.org\/10.1109\/IMCCC.2016.66","DOI":"10.1109\/IMCCC.2016.66"},{"key":"3_CR7","doi-asserted-by":"publisher","unstructured":"Chen, J., Yu, A.C.H., So, H.K.H.: Design considerations of real-time adaptive beamformer for medical ultrasound research using FPGA and GPU. In: 2012 International Conference on Field-Programmable Technology, pp. 198\u2013205 (2012). https:\/\/doi.org\/10.1109\/FPT.2012.6412134","DOI":"10.1109\/FPT.2012.6412134"},{"key":"3_CR8","unstructured":"Chenle\u00a0Yu, A.M.: Task record and replay mechanism in LLVM (2023). https:\/\/reviews.llvm.org\/D146642"},{"key":"3_CR9","doi-asserted-by":"publisher","unstructured":"Cheong, E., Liebman, J., Liu, J., Zhao, F.: TinyGALS: a programming model for event-driven embedded systems. In: Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 698\u2013704. SAC 2003, Association for Computing Machinery, New York, NY, USA (2003). https:\/\/doi.org\/10.1145\/952532.952668","DOI":"10.1145\/952532.952668"},{"key":"3_CR10","unstructured":"Cl\u00e9net, Y., et al.: MICADO-MAORY SCAO Preliminary design, development plan & calibration strategies. In: Adaptive Optics for Extremely Large Telescopes conference, 6th edn. Qu\u00e9bec, Canada (2020). https:\/\/hal.science\/hal-03078430"},{"key":"3_CR11","doi-asserted-by":"publisher","unstructured":"Ferreira, F., Bernard, J., Sevin, A., Doucet, N., Gratadour, D.: Cosmic: a real-time platform for signal processing pipelines. In: 2022 IEEE Workshop on Signal Processing Systems (SiPS), pp.\u00a01\u20136 (2022). https:\/\/doi.org\/10.1109\/SiPS55645.2022.9919251","DOI":"10.1109\/SiPS55645.2022.9919251"},{"key":"3_CR12","doi-asserted-by":"publisher","unstructured":"Ferreira, F., et al.: Hard real-time core software of the AO RTC COSMIC platform: architecture and performance, p. 172 (2020). https:\/\/doi.org\/10.1117\/12.2561244","DOI":"10.1117\/12.2561244"},{"key":"3_CR13","doi-asserted-by":"publisher","unstructured":"Gupta, K., Stuart, J.A., Owens, J.D.: A study of persistent threads style GPU programming for GPGPU workloads. In: 2012 Innovative Parallel Computing (InPar), pp. 1\u201314 (2012). https:\/\/doi.org\/10.1109\/InPar.2012.6339596","DOI":"10.1109\/InPar.2012.6339596"},{"key":"3_CR14","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1146\/annurev-astro-081817-052000","volume":"56","author":"O Guyon","year":"2018","unstructured":"Guyon, O.: Extreme adaptive optics. Ann. Rev. Astron. Astrophys. 56, 315\u2013355 (2018)","journal-title":"Ann. Rev. Astron. Astrophys."},{"key":"3_CR15","doi-asserted-by":"crossref","unstructured":"Khalilov, M., Timoveev, A.: Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU. In: Journal of Physics: Conference Series, vol. 1740, p. 012056. IOP Publishing (2021)","DOI":"10.1088\/1742-6596\/1740\/1\/012056"},{"key":"3_CR16","doi-asserted-by":"publisher","unstructured":"Li, H., Yu, D., Kumar, A., Tu, Y.C.: Performance modeling in CUDA streams - a means for high-throughput data processing. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 301\u2013310 (2014). https:\/\/doi.org\/10.1109\/BigData.2014.7004245","DOI":"10.1109\/BigData.2014.7004245"},{"key":"3_CR17","unstructured":"Ltd, C.S.: SYCL Graphs (2024). https:\/\/codeplay.com\/portal\/blogs\/2024\/01\/22\/sycl-graphs"},{"key":"3_CR18","unstructured":"NVIDIA: CUDA 10 Features Revealed: Turing, CUDA Graphs, and More (2018). https:\/\/developer.nvidia.com\/blog\/cuda-10-features-revealed\/"},{"key":"3_CR19","doi-asserted-by":"publisher","unstructured":"Olmedo, I.S., Capodieci, N., Martinez, J.L., Marongiu, A., Bertogna, M.: Dissecting the CUDA scheduling hierarchy: a performance and predictability perspective. In: 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 213\u2013225 (2020). https:\/\/doi.org\/10.1109\/RTAS48715.2020.000-5","DOI":"10.1109\/RTAS48715.2020.000-5"},{"key":"3_CR20","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1007\/978-3-319-45550-1_9","volume-title":"OpenMP: Memory, Devices, and Tasks","author":"A Podobas","year":"2016","unstructured":"Podobas, A., Karlsson, S.: Towards unifying OpenMP under the task-parallel paradigm. In: Maruyama, N., de Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 116\u2013129. Springer, Cham (2016). https:\/\/doi.org\/10.1007\/978-3-319-45550-1_9"},{"key":"3_CR21","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1007\/978-3-030-28596-8_15","volume-title":"OpenMP: Conquering the Full Hardware Spectrum","author":"A Rico","year":"2019","unstructured":"Rico, A., S\u00e1nchez Barrera, I., Joao, J.A., Randall, J., Casas, M., Moret\u00f3, M.: On the benefits of tasking with OpenMP. In: Fan, X., de Supinski, B.R., Sinnen, O., Giacaman, N. (eds.) IWOMP 2019. LNCS, vol. 11718, pp. 217\u2013230. Springer, Cham (2019). https:\/\/doi.org\/10.1007\/978-3-030-28596-8_15"},{"key":"3_CR22","unstructured":"Todd, A.: Improving real-time performance with CUDA persistent threads (CuPer) on the Jetson TX2. Concurrent Real-Time White Paper (2018)"},{"issue":"4","key":"3_CR23","doi-asserted-by":"publisher","first-page":"805","DOI":"10.1109\/TPDS.2021.3097283","volume":"33","author":"CR Trott","year":"2021","unstructured":"Trott, C.R., et al.: Kokkos 3: Programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805\u2013817 (2021)","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"3_CR24","doi-asserted-by":"crossref","unstructured":"Yu, C., Royuela, S., Qui\u00f1ones, E.: OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices. In: Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems, p. 42-47. SCOPES 2020, Association for Computing Machinery (2020)","DOI":"10.1145\/3378678.3391881"},{"key":"3_CR25","doi-asserted-by":"crossref","unstructured":"Yu, C., Royuela, S., Qui\u00f5nes, E.: Taskgraph: a low contention OpenMP tasking framework. IEEE Transactions on Parallel and Distributed Systems (2023)","DOI":"10.1109\/TPDS.2023.3284219"},{"key":"3_CR26","doi-asserted-by":"publisher","unstructured":"Yu, C., Royuela, S., Qui\u00f1ones, E.: Enhancing heterogeneous computing through OpenMP and GPU graph. In: 53rd International Conference on Parallel Processing (2024). https:\/\/doi.org\/10.1145\/3673038.3673050","DOI":"10.1145\/3673038.3673050"}],"container-title":["Lecture Notes in Computer Science","Advancing OpenMP for Future Accelerators"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-72567-8_3","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,19]],"date-time":"2024-09-19T16:21:54Z","timestamp":1726762914000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-72567-8_3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"ISBN":["9783031725661","9783031725678"],"references-count":26,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-72567-8_3","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2024]]},"assertion":[{"value":"16 September 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"IWOMP","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Workshop on OpenMP","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Perth, WA","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Australia","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"23 September 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26 September 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"20","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"iwomp2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/www.iwomp.org","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}