{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:36:42Z","timestamp":1760150202037,"version":"build-2065373602"},"reference-count":47,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2023,10,31]],"date-time":"2023-10-31T00:00:00Z","timestamp":1698710400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Saint Petersburg State University","award":["86066736","94062114"],"award-info":[{"award-number":["86066736","94062114"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Bottlenecks and imbalance in parallel programs can significantly affect performance of parallel execution. Finding these bottlenecks is a key issue in performance analysis of MPI programs especially on a large scale. One of the ways to discover bottlenecks is to analyze the critical path of the parallel program: the longest execution path in the program activity graph. There are a number of methods of finding the critical path; however, most of them suffer a performance drop when scaled. In this paper, we analyze several methods of critical path finding based on classical Dijkstra and Delta-stepping algorithms along with the proposed algorithm based on topological sorting. Corresponding algorithms for each approach are presented including additional enhancements for increasing performance. The implementation of the algorithms and resulting performance for several benchmark applications (NAS Parallel Benchmarks, CP2K, OpenFOAM, LAMMPS, and MiniFE) are analyzed and discussed.<\/jats:p>","DOI":"10.3390\/a16110505","type":"journal-article","created":{"date-parts":[[2023,10,31]],"date-time":"2023-10-31T12:48:31Z","timestamp":1698756511000},"page":"505","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Finding Bottlenecks in Message Passing Interface Programs by Scalable Critical Path Analysis"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2458-3194","authenticated-orcid":false,"given":"Vladimir","family":"Korkhov","sequence":"first","affiliation":[{"name":"Faculty of Applied Mathematics and Control Processes, Saint Petersburg State University, Universitetskaya emb. 7-9, St. Petersburg 199034, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7067-6928","authenticated-orcid":false,"given":"Ivan","family":"Gankevich","sequence":"additional","affiliation":[{"name":"Faculty of Applied Mathematics and Control Processes, Saint Petersburg State University, Universitetskaya emb. 7-9, St. Petersburg 199034, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2128-8368","authenticated-orcid":false,"given":"Anton","family":"Gavrikov","sequence":"additional","affiliation":[{"name":"Faculty of Applied Mathematics and Control Processes, Saint Petersburg State University, Universitetskaya emb. 7-9, St. Petersburg 199034, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4084-3179","authenticated-orcid":false,"given":"Maria","family":"Mingazova","sequence":"additional","affiliation":[{"name":"Faculty of Applied Mathematics and Control Processes, Saint Petersburg State University, Universitetskaya emb. 7-9, St. Petersburg 199034, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5835-9313","authenticated-orcid":false,"given":"Ivan","family":"Petriakov","sequence":"additional","affiliation":[{"name":"Faculty of Applied Mathematics and Control Processes, Saint Petersburg State University, Universitetskaya emb. 7-9, St. Petersburg 199034, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2928-7098","authenticated-orcid":false,"given":"Dmitrii","family":"Tereshchenko","sequence":"additional","affiliation":[{"name":"Faculty of Applied Mathematics and Control Processes, Saint Petersburg State University, Universitetskaya emb. 7-9, St. Petersburg 199034, Russia"}]},{"given":"Artem","family":"Shatalin","sequence":"additional","affiliation":[{"name":"Huawei, Nizhny Novgorod 603006, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7743-5350","authenticated-orcid":false,"given":"Vitaly","family":"Slobodskoy","sequence":"additional","affiliation":[{"name":"Huawei, Nizhny Novgorod 603006, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2023,10,31]]},"reference":[{"key":"ref_1","unstructured":"(2023, October 18). MPI Forum. Available online: https:\/\/www.mpi-forum.org\/."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Schulz, M. (2005, January 27\u201330). Extracting Critical Path Graphs from MPI Applications. Proceedings of the 2005 IEEE International Conference on Cluster Computing, Burlington, MA, USA.","DOI":"10.1109\/CLUSTR.2005.347035"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1145\/79173.79181","article-title":"A bridging model for parallel computation","volume":"33","author":"Valiant","year":"1990","journal-title":"Commun. ACM"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bailey, D.H., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., and Dagum, L. (1991, January 18\u201322). The NAS parallel benchmarks\u2014Summary and preliminary results. Proceedings of the 1991 ACM\/IEEE Conference on Supercomputing\u2014Supercomputing\u201991, Albuquerque, NM, USA.","DOI":"10.1145\/125826.125925"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"194103","DOI":"10.1063\/5.0007045","article-title":"CP2K: An electronic structure and molecular dynamics software package\u2014Quickstep: Efficient and accurate electronic structure calculations","volume":"152","author":"Iannuzzi","year":"2020","journal-title":"J. Chem. Phys."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"10817","DOI":"10.1016\/j.cpc.2021.108171","article-title":"LAMMPS\u2014A flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales","volume":"271","author":"Thompson","year":"2022","journal-title":"Comp. Phys. Comm."},{"key":"ref_7","first-page":"354","article-title":"OpenFOAM for computational fluid dynamics","volume":"61","author":"Chen","year":"2014","journal-title":"N. Am. Math. Soc."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"5374","DOI":"10.1002\/cpe.3587","article-title":"Assessing a mini-application as a performance proxy for a finite element method engineering application","volume":"27","author":"Lin","year":"2015","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_9","unstructured":"Yang, C.Q., and Miller, B.P. (1988, January 13\u201317). Critical Path Analysis for the Execution of Parallel and Distributed Programs. Proceedings of the 8th International Conference on Distributed Computing Systems, San Jose, CA, USA. Available online: https:\/\/ftp.cs.wisc.edu\/paradyn\/papers\/CritPath-ICDCS1988.pdf."},{"key":"ref_10","first-page":"871","article-title":"Efficient Algorithms for the Longest Path Problem","volume":"15","author":"Fleischer","year":"2005","journal-title":"Algorithms and Computation, Proceedings of the 15th International Symposium, ISAAC 2004, Hong Kong, China, 20\u201322 December 2004"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1016\/S0196-6774(03)00076-2","article-title":"\u0394-stepping: A parallelizable shortest path algorithm","volume":"49","author":"Meyer","year":"2003","journal-title":"J. Algorithms"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2031","DOI":"10.1109\/TPDS.2016.2634535","article-title":"Scalable Single Source Shortest Path Algorithms for Massively Parallel Systems","volume":"28","author":"Chakaravarthy","year":"2017","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_13","unstructured":"Kranjvcevi\u2019c, M., Palossi, D., and Pintarelli, S. (2016). Parallel Delta-Stepping Algorithm for Shared Memory Architectures. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1080\/13658810801949850","article-title":"Finding shortest paths on real road networks: The case for A*","volume":"23","author":"Zeng","year":"2009","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Alves, D.R., Krishnakumar, M.S., and Garg, V.K. (2020, January 5\u20138). Efficient Parallel Shortest Path Algorithms. Proceedings of the 2020 19th International Symposium on Parallel and Distributed Computing (ISPDC), Warsaw, Poland.","DOI":"10.1109\/ISPDC51135.2020.00034"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1145\/358690.358717","article-title":"Distributed computation on graphs: Shortest path algorithms","volume":"25","author":"Chandy","year":"1982","journal-title":"Commun. ACM"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1029","DOI":"10.1109\/71.730530","article-title":"Critical Path Profiling of Message Passing and Shared-Memory Programs","volume":"9","author":"Hollingsworth","year":"1998","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Dooley, I., Arya, A., and Kal\u00e9, L.V. (2010, January 19\u201323). Detecting and using critical paths at runtime in message driven parallel programs. Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), Atlanta, GA, USA.","DOI":"10.1109\/IPDPSW.2010.5470844"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"B\u00f6hme, D., Wolf, F.A., and Geimer, M. (2012, January 21\u201325). Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, Shanghai, China.","DOI":"10.1109\/IPDPSW.2012.321"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1002\/cpe.1556","article-title":"The SCALASCA performance toolset architecture","volume":"22","author":"Geimer","year":"2010","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_21","unstructured":"Gr\u00fcnewald, D., and Simmendinger, C. (2013, January 3\u20134). The GASPI API specification and its implementation GPI 2.0. Proceedings of the 7th International Conference on PGAS Programming Models, Edinburgh, UK."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Herold, C., Krzikalla, O., and Kn\u00fcpfer, A. (June, January 29). Optimizing One-Sided Communication of Parallel Applications Using Critical Path Methods. Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL, USA.","DOI":"10.1109\/IPDPSW.2017.64"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Chen, J., and Clapp, R.M. (2015, January 29\u201331). Critical-path candidates: Scalable performance modeling for MPI workloads. Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Philadelphia, PA, USA.","DOI":"10.1109\/ISPASS.2015.7095779"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"100001","DOI":"10.1016\/j.tbench.2021.100001","article-title":"Workflow Critical Path: A data-oriented critical path metric for Holistic HPC Workflows","volume":"1","author":"Nguyen","year":"2021","journal-title":"BenchCounc. Trans. Benchmarks Stand. Eval."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Voevodin, V., Sobolev, S., Yakobovskiy, M., and Shagaliev, R. (2022). Supercomputing: 8th Russian Supercomputing Days, RuSCDays 2022, Moscow, Russia, September 26\u201327, 2022, Revised Selected Papers, Springer International Publishing.","DOI":"10.1007\/978-3-031-22941-1"},{"key":"ref_26","unstructured":"Fieger, K., Balyo, T., Schulz, C., and Schreiber, D. (2019, January 16\u201317). Finding optimal longest paths by dynamic programming in parallel. Proceedings of the Twelfth Annual Symposium on Combinatorial Search, Napa, CA, USA."},{"key":"ref_27","unstructured":"Raggi, M. (2016). Finding long simple paths in a weighted digraph using pseudo-topological orderings. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Portugal, D., Antunes, C.H., and Rocha, R. (2010, January 10\u201313). A study of genetic algorithms for approximating the longest path in generic graphs. Proceedings of the 2010 IEEE International Conference on Systems, Man and Cybernetics, Istanbul, Turkey.","DOI":"10.1109\/ICSMC.2010.5641920"},{"key":"ref_29","unstructured":"Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G., Gabriel, E., and Dongarra, J. (2005, January 4\u20138). Performance Analysis of MPI Collective Operations. Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, Denver, CO, USA."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1007\/s10586-007-0012-0","article-title":"Performance analysis of MPI collective operations","volume":"10","author":"Angskun","year":"2007","journal-title":"Clust. Comput."},{"key":"ref_31","unstructured":"Saif, T., and Parashar, M. (2004). Lecture Notes in Computer Science, Springer."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Hoefler, T., Lumsdaine, A., and Rehm, W. (2007, January 10\u201316). Implementation and performance analysis of non-blocking collective operations for MPI. Proceedings of the 2007 ACM\/IEEE Conference on Supercomputing\u2014SC\u201907, Reno, NV, USA.","DOI":"10.1145\/1362622.1362692"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Ueno, K., and Suzumura, T. (2012, January 21\u201325). Highly scalable graph search for the Graph500 benchmark. Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing\u2014HPDC\u201912, Online.","DOI":"10.1145\/2287076.2287104"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1007\/BF01386390","article-title":"A note on two problems in connexion with graphs","volume":"1","author":"Dijkstra","year":"1959","journal-title":"Numer. Math."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"558","DOI":"10.1145\/368996.369025","article-title":"Topological sorting of large networks","volume":"5","author":"Kahn","year":"1962","journal-title":"Commun. ACM"},{"key":"ref_36","unstructured":"Aguilar, X. (2020). Performance Monitoring, Analysis, and Real-Time Introspection on Large-Scale Parallel Systems. [Ph.D. Thesis, KTH Royal Institute of Technology]."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Garcia, M., Corbalan, J., and Labarta, J. (2009, January 22\u201325). LeWI: A runtime balancing algorithm for nested parallelism. Proceedings of the 2009 International Conference on Parallel Processing, Vienna, Austria.","DOI":"10.1109\/ICPP.2009.56"},{"key":"ref_38","unstructured":"Arzt, P., Fischler, Y., Lehr, J.P., and Bischof, C. (2021). European Conference on Parallel Processing, Springer."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Pearce, O., Gamblin, T., De Supinski, B.R., Schulz, M., and Amato, N.M. (2012, January 25\u201329). Quantifying the effectiveness of load balance algorithms. Proceedings of the 26th ACM International Conference on Supercomputing, Venice, Italy.","DOI":"10.1145\/2304576.2304601"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Tallent, N.R., Adhianto, L., and Mellor-Crummey, J.M. (2010, January 13\u201319). Scalable identification of load imbalance in parallel executions using call path profiles. Proceedings of the SC\u201910: 2010 ACM\/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA.","DOI":"10.1109\/SC.2010.47"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Schmitt, F., Stolle, J., and Dietrich, R. (2014, January 9\u201312). CASITA: A tool for identifying critical optimization targets in distributed heterogeneous applications. Proceedings of the 2014 43rd International Conference on Parallel Processing Workshops, Minneapolis, MN, USA.","DOI":"10.1109\/ICPPW.2014.35"},{"key":"ref_42","unstructured":"Wolf, F., and Mohr, B. (2000). European Conference on Parallel Processing, Springer."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1016\/S1383-7621(03)00102-4","article-title":"Automatic performance analysis of hybrid MPI\/OpenMP applications","volume":"49","author":"Wolf","year":"2003","journal-title":"J. Syst. Archit."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Schmitt, F., Dietrich, R., and Juckeland, G. (2014, January 19\u201323). Scalable critical path analysis for hybrid MPI-CUDA applications. Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, Phoenix, AZ, USA.","DOI":"10.1109\/IPDPSW.2014.103"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Hermanns, M.A., Miklosch, M., B\u00f6hme, D., and Wolf, F. (2013, January 15\u201318). Understanding the formation of wait states in applications with one-sided communication. Proceedings of the 20th European MPI Users\u2019 Group Meeting, Madrid, Spain.","DOI":"10.1145\/2488551.2488569"},{"key":"ref_46","unstructured":"(2023, October 18). OpenFOAM Tutorial, pitzDailyExptInlet Case Code Repository. Available online: https:\/\/github.com\/OpenFOAM\/OpenFOAM-9\/tree\/master\/tutorials\/\/incompressible\/simpleFoam\/pitzDailyExptInlet."},{"key":"ref_47","unstructured":"(2023, October 18). LAMMPS Benchmarks. Available online: https:\/\/docs.lammps.org\/Speed_bench.html."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/11\/505\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:14:47Z","timestamp":1760130887000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/11\/505"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,31]]},"references-count":47,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["a16110505"],"URL":"https:\/\/doi.org\/10.3390\/a16110505","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2023,10,31]]}}}