{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:13:33Z","timestamp":1760058813726,"version":"build-2065373602"},"reference-count":96,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T00:00:00Z","timestamp":1745798400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Umm al-Qura University","award":["25UQU4350478GSSR01S"],"award-info":[{"award-number":["25UQU4350478GSSR01S"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>The growing adoption of supercomputers across various scientific disciplines, particularly by researchers without a background in computer science, has intensified the demand for parallel applications. These applications are typically developed using a combination of programming models within languages such as C, C++, and Fortran. However, modern multi-core processors and accelerators necessitate fine-grained control to achieve effective parallelism, complicating the development process. To address this, developers commonly utilize high-level programming models such as Open Multi-Processing (OpenMP), Open Accelerators (OpenACCs), Message Passing Interface (MPI), and Compute Unified Device Architecture (CUDA). These models may be used independently or combined into dual- or tri-model applications to leverage their complementary strengths. However, integrating multiple models introduces subtle and difficult-to-detect runtime errors such as data races, deadlocks, and livelocks that often elude conventional compilers. This complexity is exacerbated in applications that simultaneously incorporate MPI, OpenMP, and CUDA, where the origin of runtime errors, whether from individual models, user logic, or their interactions, becomes ambiguous. Moreover, existing tools are inadequate for detecting such errors in tri-model applications, leaving a critical gap in development support. To address this gap, the present study introduces a static analysis tool designed specifically for tri-model applications combining MPI, OpenMP, and CUDA in C++-based environments. The tool analyzes source code to identify both actual and potential runtime errors prior to execution. Central to this approach is the introduction of error dependency graphs, a novel mechanism for systematically representing and analyzing error correlations in hybrid applications. By offering both error classification and comprehensive static detection, the proposed tool enhances error visibility and reduces manual testing effort. This contributes significantly to the development of more robust parallel applications for high-performance computing (HPC) and future exascale systems.<\/jats:p>","DOI":"10.3390\/computers14050164","type":"journal-article","created":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T11:48:33Z","timestamp":1745840913000},"page":"164","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Error Classification and Static Detection Methods in Tri-Programming Models: MPI, OpenMP, and CUDA"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0896-503X","authenticated-orcid":false,"given":"Saeed Musaad","family":"Altalhi","sequence":"first","affiliation":[{"name":"Department of Computer Science, Faculty of Computing, and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia"},{"name":"Department of Computer Science and Artificial Intelligence, Umm Al-Qura University, Makkah 21955, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3987-9051","authenticated-orcid":false,"given":"Fathy Elbouraey","family":"Eassa","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Computing, and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0806-1396","authenticated-orcid":false,"given":"Sanaa Abdullah","family":"Sharaf","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Computing, and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7644-5039","authenticated-orcid":false,"given":"Ahmed Mohammed","family":"Alghamdi","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, College of Computer Science and Engineering, University of Jeddah, Jeddah 21493, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3104-209X","authenticated-orcid":false,"given":"Khalid Ali","family":"Almarhabi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, College of Computing at Alqunfudah, Umm Al-Qura University, Makkah 21514, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2226-4955","authenticated-orcid":false,"given":"Rana Ahmad Bilal","family":"Khalid","sequence":"additional","affiliation":[{"name":"College of Engineering and Physical Sciences, Aston University, Aston Triangle, Birmingham B4 7ET, UK"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"6541","DOI":"10.1109\/TAP.2019.2920253","article-title":"Finite Difference Generated Transient Potentials of Open-Layered Media by Parallel Computing Using OpenMP, MPI, OpenACC, and CUDA","volume":"67","year":"2019","journal-title":"IEEE Trans. Antennas Propag."},{"key":"ref_2","unstructured":"(2023, February 06). MPI Forum MPI Documents. Available online: https:\/\/www.mpi-forum.org\/docs\/."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"80358","DOI":"10.1109\/ACCESS.2020.2991009","article-title":"ACC_TEST: Hybrid Testing Approach for OpenACC-Based Programs","volume":"8","author":"Eassa","year":"2020","journal-title":"IEEE Access"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1007\/978-3-319-52709-3_10","article-title":"An Extended Polyhedral Model for SPMD Programs and Its Use in Static Data Race Detection","volume":"Volume 10136 LNCS","author":"Chatarasi","year":"2017","journal-title":"Lecture Notes in Computer Science"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jocs.2015.04.022","article-title":"A Case Study of CUDA FORTRAN and OpenACC for an Atmospheric Climate Kernel","volume":"9","author":"Norman","year":"2015","journal-title":"J. Comput. Sci."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Hoshino, T., Maruyama, N., Matsuoka, S., and Takaki, R. (2013, January 13\u201316). CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application. Proceedings of the 13th IEEE\/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Delft, The Netherlands.","DOI":"10.1109\/CCGrid.2013.12"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sunitha, N.V., Raju, K., and Chiplunkar, N.N. (2017, January 10\u201311). Performance Improvement of CUDA Applications by Reducing CPU-GPU Data Transfer Overhead. Proceedings of the 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India.","DOI":"10.1109\/ICICCT.2017.7975190"},{"key":"ref_8","unstructured":"(2023, February 06). NVIDIA About CUDA|NVIDIA Developer. Available online: https:\/\/developer.nvidia.com\/about-cuda."},{"key":"ref_9","unstructured":"(2023, February 06). OpenMP ARB About Us\u2014OpenMP. Available online: https:\/\/www.openmp.org\/about\/about-us\/."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Jin, Z., and Finkel, H. (2018, January 14\u201316). Performance-Oriented Optimizations for OpenCL Streaming Kernels on the FPGA. Proceedings of the IWOCL\u201918: International Workshop on OpenCL, Oxford, UK.","DOI":"10.1145\/3204919.3204920"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1016\/j.jpdc.2021.07.006","article-title":"MDScale: Scalable Multi-GPU Bonded and Short-Range Molecular Dynamics","volume":"157","author":"Barreales","year":"2021","journal-title":"J. Parallel Distrib. Comput."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1177\/10943420211008288","article-title":"GPU-Accelerated Molecular Dynamics: State-of-Art Software Performance and Porting from Nvidia CUDA to AMD HIP","volume":"35","author":"Kondratyuk","year":"2021","journal-title":"Int. J. High Perform. Comput. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Strout, M.M., De Supinski, B.R., Scogland, T.R.W., Davis, E.C., and Olschanowsky, C. (2018, January 26\u201328). Evolving OpenMP for Evolving Architectures. Proceedings of the 14th International Workshop on OpenMP, IWOMP 2018, Barcelona, Spain. Proceedings.","DOI":"10.1007\/978-3-319-98521-3"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1007\/978-3-319-11454-5_5","article-title":"Classification of Common Errors in OpenMP Applications","volume":"Volume 8766","author":"DeRose","year":"2014","journal-title":"Lecture Notes in Computer Science"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2004","DOI":"10.1109\/JPROC.2018.2853600","article-title":"The Ongoing Evolution of OpenMP","volume":"106","author":"Supinski","year":"2018","journal-title":"Proc. IEEE"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1007\/978-3-319-98521-3_4","article-title":"Extending OpenMP to Facilitate Loop Optimization","volume":"Volume 11128","author":"Bertolacci","year":"2018","journal-title":"Lecture Notes in Computer Science"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sato, M., Hanawa, T., M\u00fcller, M.S., Chapman, B.M., and de Supinski, B.R. (2010). Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-642-13217-9"},{"key":"ref_18","first-page":"64","article-title":"Compute Unified Device Architecture (CUDA) GPU Programming Model and Possible Integration to the Parallel Environment","volume":"3","author":"Harakal","year":"2008","journal-title":"Sci. Mil. J."},{"key":"ref_19","unstructured":"Saillard, E. (2015). Static\/Dynamic Analyses for Validation and Improvements of Multi-Model HPC Applications. [Ph.D. Thesis, Universit\u00e9 de Bordeaux]. Volume 1228."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"117808","DOI":"10.1109\/ACCESS.2022.3219406","article-title":"Errors Classification and Static Detection Techniques for Dual-Programming Model (OpenMP and OpenACC)","volume":"10","author":"Basloom","year":"2022","journal-title":"IEEE Access"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"825","DOI":"10.1109\/TSE.2016.2537335","article-title":"Dynamic Testing for Deadlocks via Constraints","volume":"42","author":"Cai","year":"2016","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_22","first-page":"279","article-title":"Static\/Dynamic Validation of MPI Collective Communications in Multi-Threaded Context","volume":"Volume 2015","author":"Saillard","year":"2015","journal-title":"Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP"},{"key":"ref_23","unstructured":"(2023, June 18). Intel\u00ae Trace Analyzer and Collector Available. Available online: https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/trace-analyzer-collector\/user-guide-reference\/2023-1\/correctness-checking-of-mpi-applications.html."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Droste, A., Kuhn, M., and Ludwig, T.M.-C. (2015, January 15). MPI-Checker-Static Analysis for MPI. Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, Austin, TX, USA.","DOI":"10.1145\/2833157.2833159"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Resch, M., Keller, R., Himmler, V., Krammer, B., and Schulz, A. (2008). Enhanced Memory Debugging of MPI-Parallel Applications in Open MPI. Tools for High Performance Computing, Springer.","DOI":"10.1007\/978-3-540-68564-7"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Vetter, J.S., and de Supinski, B.R. (2000, January 4\u201310). Dynamic Software Testing of MPI Applications with Umpire. Proceedings of SC \u201800: Proceedings of the 2000 ACM\/IEEE Conference on Supercomputing, Dallas, TX, USA.","DOI":"10.1109\/SC.2000.10055"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Hilbrich, T., Schulz, M., de Supinski, B.R., and M\u00fcller, M.S. (2010). MUST: A Scalable Approach to Runtime Error Detection in MPI Programs. Tools for High Performance Computing 2009, Springer.","DOI":"10.1007\/978-3-642-11261-4_5"},{"key":"ref_28","unstructured":"Kranzlmueller, D., Schaubschlaeger, C., and Volkert, J. (2000, January 28\u201330). A Brief Overview of the MAD Debugging Activities. Proceedings of the AADEBUG 2000, 4th International Workshop on Automated Testing, Munich, Germany."},{"key":"ref_29","unstructured":"(2023, February 19). MUST\u2014RWTH AACHEN UNIVERSITY Lehrstuhl F\u00fcr Informatik 12\u2014Deutsch. Available online: https:\/\/www.i12.rwth-aachen.de\/cms\/Lehrstuhl-fuer-Informatik\/Forschung\/Forschungsschwerpunkte\/Lehrstuhl-fuer-Hochleistungsrechnen\/~nrbe\/MUST\/."},{"key":"ref_30","first-page":"1","article-title":"MPI Runtime Error Detection with MUST: Advances in Deadlock Detection","volume":"Volume 21","author":"Hilbrich","year":"2012","journal-title":"Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3095075","article-title":"Precise Predictive Analysis for Discovering Communication Deadlocks in MPI Programs","volume":"39","author":"Forejt","year":"2017","journal-title":"ACM Trans. Program."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"M\u00fcller, M.S., Resch, M.M., Schulz, A., and Nagel, W.E. (2010). The Importance of Run-Time Error Detection. Tools for High Performance Computing 2009, Springer.","DOI":"10.1007\/978-3-642-11261-4"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Saillard, E., Carribault, P., and Barthou, D. (2013, January 15\u201318). Combining Static and Dynamic Validation of MPI Collective Communications. Proceedings of the 20th European MPI Users\u2019 Group Meeting, Madrid, Spain.","DOI":"10.1145\/2488551.2488555"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"91488","DOI":"10.1109\/ACCESS.2020.2994172","article-title":"ACC_TEST: Hybrid Testing Techniques for MPI-Based Programs","volume":"8","author":"Alghamdi","year":"2020","journal-title":"IEEE Access"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chapman, B.M., Gropp, W.D., Kumaran, K., and M\u00fcller, M.S. (2011). OmpVerify: Polyhedral Analysis for the OpenMP Programmer. OpenMP in the Petascale Era, Springer. IWOMP 2011 Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-642-21487-5"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ye, F., Schordan, M., Liao, C., Lin, P.-H., Karlin, I., and Sarkar, V. (2018, January 12). Using Polyhedral Analysis to Verify OpenMP Applications Are Data Race Free. Proceedings of the 2018 IEEE\/ACM 2nd International Workshop on Software Correctness for HPC Applications (Correctness), Dallas, TX, USA.","DOI":"10.1109\/Correctness.2018.00010"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Jannesari, A., Bao, K., Pankratius, V., and Tichy, W.F. (2009, January 23\u201329). Helgrind+: An Efficient Dynamic Race Detector. Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome Italy.","DOI":"10.1109\/IPDPS.2009.5160998"},{"key":"ref_38","unstructured":"(2023, March 08). Valgrind: Tool Suite. Available online: https:\/\/valgrind.org\/info\/tools.html#memcheck."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1145\/1273442.1250746","article-title":"Valgrind","volume":"42","author":"Nethercote","year":"2007","journal-title":"ACM Sigplan Not."},{"key":"ref_40","first-page":"669","article-title":"Comparing Intel Thread Checker and Sun Thread Analyzer","volume":"15","author":"Terboven","year":"2008","journal-title":"Adv. Parallel Comput."},{"key":"ref_41","unstructured":"(2023, March 08). Intel(R) Thread Checker 3.1 Release Notes. Available online: https:\/\/registrationcenter-download.intel.com\/akdlm\/irc_nas\/1366\/ReleaseNotes.htm."},{"key":"ref_42","unstructured":"(2023, March 08). Sun Studio 12: Thread Analyzer User\u2019s Guide. Available online: https:\/\/docs.oracle.com\/cd\/E19205-01\/820-0619\/820-0619.pdf."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Gu, Y., and Mellor-Crummey, J. (2018, January 11\u201316). Dynamic Data Race Detection for OpenMP Programs. Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA.","DOI":"10.1109\/SC.2018.00064"},{"key":"ref_44","unstructured":"Serebryany, K., Bruening, D., Potapenko, A., and Vyukov, D. (2012, January 13\u201315). AddressSanitizer: A Fast Address Sanity Checker. Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC 12), Boston, MA, USA."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1007\/978-3-642-29860-8_9","article-title":"Dynamic Race Detection with LLVM Compiler: Compile-Time Instrumentation for ThreadSanitizer","volume":"Volume 7186 LNCS","author":"Serebryany","year":"2012","journal-title":"Proceedings of the International Conference on Runtime Verification"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Atzeni, S., Gopalakrishnan, G., Rakamaric, Z., Ahn, D.H., Laguna, I., Schulz, M., Lee, G.L., Protze, J., and Muller, M.S. (2016, January 23\u201327). ARCHER: Effectively Spotting Data Races in Large OpenMP Applications. Proceedings of the 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA.","DOI":"10.1109\/IPDPS.2016.68"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1007\/978-3-540-79561-2_3","article-title":"Detection of Violations to the Mpi Standard in Hybrid Openmp\/Mpi Applications","volume":"Volume 5004 LNCS","author":"Hilbrich","year":"2008","journal-title":"Lecture Notes in Computer Science"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1016\/S0927-5452(04)80063-7","article-title":"MARMOT: An MPI Analysis and Checking Tool","volume":"13","author":"Krammer","year":"2004","journal-title":"Adv. Parallel Comput."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Betts, A., Chong, N., Donaldson, A.F., Qadeer, S., and Thomson, P. (2012, January 19\u201326). GPU Verify: A Verifier for GPU Kernels. Proceedings of the Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA, New York, NY, USA.","DOI":"10.1145\/2384616.2384625"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1007\/978-3-319-08867-9_15","article-title":"Engineering a Static Verification Tool for GPU Kernels","volume":"Volume 8559 LNCS","author":"Bardsley","year":"2014","journal-title":"Lecture Notes in Computer Science"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Gupta, S., Sultan, F., Cadambi, S., Ivanci\u0107, F., and Rotteler, M. (2009, January 23\u201329). Using Hardware Transactional Memory for Data Race Detection. Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy.","DOI":"10.1109\/IPDPS.2009.5161006"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1007\/978-3-642-40787-1_12","article-title":"Accelerating Data Race Detection Utilizing On-Chip Data-Parallel Cores","volume":"Volume 8174","author":"Mekkat","year":"2013","journal-title":"Runtime Verification: 4th International Conference, RV 2013, Rennes, France, 24\u201327 September 2013"},{"key":"ref_53","unstructured":"Bekar, C., Elmas, T., Okur, S., and Tasiran, S. (2012, January 3). KUDA: GPU Accelerated Split Race Checker. Proceedings of the Workshop on Determinism and Correctness in Parallel Programming (WoDet), London, UK."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1109\/TPDS.2013.44","article-title":"GMRace: Detecting Data Races in GPU Programs via a Low-Overhead Scheme","volume":"25","author":"Zheng","year":"2014","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Zheng, M., Ravi, V.T., Qin, F., and Agrawal, G. (2011, January 12\u201316). GRace: A Low-Overhead Mechanism for Detecting Data Races in GPU Programs. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, San Antonio, TX USA.","DOI":"10.1145\/1941553.1941574"},{"key":"ref_56","first-page":"113","article-title":"Parallelized Race Detection Based on GPU Architecture","volume":"451 CCIS","author":"Dai","year":"2014","journal-title":"Commun. Comput. Inf. Sci."},{"key":"ref_57","unstructured":"Boyer, M., Skadron, K., and Weimer, W. (2016, January 1). Automated Dynamic Analysis of CUDA Programs. Proceedings of the Third Workshop on Software Tools for MultiCore Systems, Amsterdam Netherlands."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Li, P., Li, G., and Gopalakrishnan, G. (2014, January 16\u201321). Practical Symbolic Race Checking of GPU Programs. Proceedings of the SC\u201914: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA. Volume 14.","DOI":"10.1109\/SC.2014.20"},{"key":"ref_59","unstructured":"Clemencon, C., Fritscher, J., and Ruhl, R. (1995, January 22\u201325). Visualization, Execution Control and Replay of Massively Parallel Programs within Annai\u2019s Debugging Tool. Proceedings of the High-Performance Computing Symposium (HPCS\u201995), Raleigh, NC, USA."},{"key":"ref_60","unstructured":"Bronevetsky, G., Laguna, I., Bagchi, S., De Supinski, B.R., Ahn, D.H., and Schulz, M. (July, January 28). AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks. Proceedings of the International Conference on Dependable Systems and Networks, Chicago, IL, USA."},{"key":"ref_61","unstructured":"(2023, October 31). Linaro DDT. Available online: https:\/\/www.linaroforge.com\/linaro-ddt."},{"key":"ref_62","unstructured":"(2023, October 31). Allinea DDT|HPC@LLNL, Available online: https:\/\/hpc.llnl.gov\/software\/development-environment-software\/allinea-ddt."},{"key":"ref_63","unstructured":"(2023, July 14). Totalview Technologies: Totalview\u2014Parallel and Thread Debugger. Available online: https:\/\/help.totalview.io\/."},{"key":"ref_64","unstructured":"(2023, October 31). TotalView Debugger|HPC@LLNL, Available online: https:\/\/hpc.llnl.gov\/software\/development-environment-software\/totalview-debugger."},{"key":"ref_65","unstructured":"Claudio, A.P., Cunha, J.D., and Carmo, M.B. (2000, January 19\u201321). Monitoring and Debugging Message Passing Applications with MPVisualizer. Proceedings of the 8th Euromicro Workshop on Parallel and Distributed Processing, Rhodes, Greece."},{"key":"ref_66","unstructured":"(2023, March 08). Intel Inspector|HPC@LLNL, Available online: https:\/\/hpc.llnl.gov\/software\/development-environment-software\/intel-inspector."},{"key":"ref_67","unstructured":"(2023, October 31). Documentation\u2014Arm DDT, Available online: https:\/\/www.alcf.anl.gov\/support-center\/training\/debugging-arm."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Saad, S., Fadel, E., Alzamzami, O., Eassa, F., and Alghamdi, A.M. (2024). Temporal-Logic-Based Testing Tool Architecture for Dual-Programming Model Systems. Computers, 13.","DOI":"10.3390\/computers13040086"},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"113235","DOI":"10.1109\/ACCESS.2019.2935498","article-title":"Openacc Errors Classification and Static Detection Techniques","volume":"7","author":"Alghamdi","year":"2019","journal-title":"IEEE Access"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Checkaraou, A.W.M., Rousset, A., Besseron, X., Varrette, S., and Peters, B. (2018, January 24\u201327). Hybrid MPI+openMP Implementation of EXtended Discrete Element Method. Proceedings of the 2018 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD, Lyon, France.","DOI":"10.1109\/CAHPC.2018.8645880"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1080\/10618562.2019.1617856","article-title":"MPI+X: Task-Based Parallelisation and Dynamic Load Balance of Finite Element Assembly","volume":"33","author":"Houzeaux","year":"2019","journal-title":"Int. J. Comput Fluid Dyn."},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Altalhi, S.M., Eassa, F.E., Al-Ghamdi, A.S.A.M., Sharaf, S.A., Alghamdi, A.M., Almarhabi, K.A., and Khemakhem, M.A. (2023). An Architecture for a Tri-Programming Model-Based Parallel Hybrid Testing Tool. Appl. Sci., 13.","DOI":"10.3390\/app132111960"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Freire, Y.N., and Senger, H. (2023). Integrating CUDA Memory Management Mechanisms for Domain Decomposition of an Acoustic Wave Kernel Implemented in OpenMP. Escola Regional de Alto Desempenho de S\u00e3o Paulo (ERAD-SP), SBC.","DOI":"10.5753\/eradsp.2023.231895"},{"key":"ref_74","first-page":"8862123","article-title":"Hybrid MPI and CUDA Parallelization for CFD Applications on Multi-GPU HPC Clusters","volume":"2020","author":"Lai","year":"2020","journal-title":"Sci. Program."},{"key":"ref_75","first-page":"19","article-title":"Concurrent Deadlock Detection in Parallel Programs","volume":"28","author":"Haque","year":"2006","journal-title":"Int. J. Comput. Appl."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Eslamimehr, M., and Palsberg, J. (2014, January 16\u201321). Sherlock: Scalable Deadlock Detection for Concurrent Programs. Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, (FSE 2014). Association for Computing Machinery, Hong Kong China.","DOI":"10.1145\/2635868.2635918"},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1147\/JRD.2010.2060276","article-title":"Detection of Deadlock Potentials in Multithreaded Programs","volume":"54","author":"Agarwal","year":"2010","journal-title":"IBM J. Res. Dev."},{"key":"ref_78","unstructured":"(2025, April 16). OpenMP Application Programming Interface. Available online: https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP-API-Specification-5-2.pdf."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1016\/j.cpc.2010.06.035","article-title":"Hybrid CUDA, OpenMP, and MPI Parallel Programming on Multicore GPU Clusters","volume":"182","author":"Yang","year":"2011","journal-title":"Comput. Phys. Commun."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"e5728","DOI":"10.1002\/cpe.5728","article-title":"Heterogeneous Computing with OpenMP and Hydra","volume":"32","author":"Diener","year":"2020","journal-title":"Concurr. Comput."},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Akhmetova, D., Iakymchuk, R., Ekeberg, O., and Laure, E. (June, January 29). Performance Study of Multithreaded MPI and Openmp Tasking in a Large Scientific Code. Proceedings of the 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, Orlando, FL, USA.","DOI":"10.1109\/IPDPSW.2017.128"},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1177\/10943420231188079","article-title":"Heterogeneous Programming Using OpenMP and CUDA\/HIP for Hybrid CPU-GPU Scientific Applications","volume":"37","author":"Morancho","year":"2023","journal-title":"Int. J. High. Perform. Comput. Appl."},{"key":"ref_83","first-page":"18","article-title":"High-Performance Computing for Computational Science","volume":"32","author":"Senger","year":"2020","journal-title":"Concurr. Comput."},{"key":"ref_84","unstructured":"Aji, A.M., Panwar, L.S., Ji, F., Chabbi, M., Murthy, K., Balaji, P., Bisset, K.R., Dinan, J., Feng, W.C., and Mellor-Crummey, J. (July, January 27). On the Efficacy of GPU-Integrated MPI for Scientific Applications. Proceedings of the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing\u2014HPDC, Minneapolis, MN, USA."},{"key":"ref_85","unstructured":"Gottschlich, J., and Boehm, H. (2013, January 17). Generic Programming Needs Transactional Memory. Proceedings of the Transact 2013: 8th ACM SIGPLAN Workshop on Transactional Computing, Houston, TX, USA."},{"key":"ref_86","first-page":"211","article-title":"Generic Locking and Deadlock-Prevention with C++","volume":"15","author":"Suess","year":"2008","journal-title":"Adv. Parallel Comput."},{"key":"ref_87","unstructured":"(2024, October 14). NAS Parallel Benchmarks Version 3.4.3, Available online: https:\/\/www.nas.nasa.gov\/software\/npb.html."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1007\/s10766-010-0137-2","article-title":"Performance Evaluation of Mixed-Mode OpenMP\/MPI Implementations","volume":"38","author":"Bull","year":"2010","journal-title":"Int. J. Parallel Program."},{"key":"ref_89","unstructured":"Grove, D.A., and Coddington, P.D. (2024, January 25\u201327). Precise MPI Performance Measurement Using MPIBench. Proceedings of the HPC Asia, Nagoya, Japan."},{"key":"ref_90","unstructured":"(2024, October 14). GitHub\u2014LLNL\/MpiBench: MPI Benchmark to Test and Measure Collective Performance. Available online: https:\/\/github.com\/LLNL\/mpiBench."},{"key":"ref_91","unstructured":"Liao, C., Lin, P.H., Asplund, J., Schordan, M., and Karlin, I. (2020, January 9\u201319). DataRaceBench: A Benchmark Suite for Systematic Evaluation of Data Race Detection Tools. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA."},{"key":"ref_92","unstructured":"(2024, October 14). GitHub\u2014LLNL\/Dataracebench: Data Race Benchmark Suite for Evaluating OpenMP Correctness Tools Aimed to Detect Data Races. Available online: https:\/\/github.com\/LLNL\/dataracebench."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Griebler, D., Loff, J., Mencagli, G., Danelutto, M., and Fernandes, L.G. (2018, January 21\u201323). Efficient NAS Benchmark Kernels with C++ Parallel Programming. Proceedings of the 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), Cambridge, UK.","DOI":"10.1109\/PDP2018.2018.00120"},{"key":"ref_94","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1016\/j.future.2021.07.021","article-title":"The NAS Parallel Benchmarks for Evaluating C++ Parallel Programming Frameworks on Shared-Memory Architectures","volume":"125","author":"Griebler","year":"2021","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_95","unstructured":"(2024, October 14). GitHub\u2014RWTH-HPC\/DRACC: Benchmarks for Data Race Detection on Accelerators. Available online: https:\/\/github.com\/RWTH-HPC\/DRACC."},{"key":"ref_96","unstructured":"Altalhi, S.M., Eassa, F.E., Alghamdi, A.M., and Khalid, R.A.B. (2025, April 22). Static-Tools-for-Detecting-Tri-Level-Programming-Models-MPI-OpenMP-CUDA-MOC-: Static Analysis Components for Tri-Level-Programming Model Using MPI, OpenMP, and CUDA (MOC). Available online: https:\/\/github.com\/saeedaltalhi\/Static-Tools-for-Detecting-Tri-Level-Programming-Models-MPI-OpenMP-CUDA-MOC-."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/5\/164\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:23:38Z","timestamp":1760030618000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/5\/164"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,28]]},"references-count":96,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["computers14050164"],"URL":"https:\/\/doi.org\/10.3390\/computers14050164","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2025,4,28]]}}}