{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:17:13Z","timestamp":1750220233521,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,7,17]],"date-time":"2021-07-17T00:00:00Z","timestamp":1626480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI implementations and exploits an auxiliary background thread for early data exchange from faster processes to accelerate the performed all-gather operation. The other algorithm, Background Sorted Linear synchronized tree with Broadcast (BSLB), is built upon the already existing PAP-aware gather algorithm, that is, Background Sorted Linear Synchronized tree (BSLS), followed by a regular broadcast distributing gathered data to all participating processes. The background of the imbalanced PAP subject is described, along with the PAP monitoring and evaluation topics. An experimental evaluation of the algorithms based on a proposed mini-benchmark is presented. The mini-benchmark was performed over 2,000 times in a typical HPC cluster architecture with homogeneous compute nodes. The obtained results are analyzed according to different PATs, data sizes, and process numbers, showing that the proposed optimization works well for various configurations, is scalable, and can significantly reduce the all-gather elapsed times, in our case, up to factor 1.9 or 47% in comparison with the best state-of-the-art solution.<\/jats:p>","DOI":"10.1145\/3460122","type":"journal-article","created":{"date-parts":[[2021,7,17]],"date-time":"2021-07-17T10:05:22Z","timestamp":1626516322000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns"],"prefix":"10.1145","volume":"18","author":[{"given":"Jerzy","family":"Proficz","sequence":"first","affiliation":[{"name":"Gdansk University of Technology, Centre of Informatics\u2014Tricity Academic Supercomputer &amp; networK (CI TASK), Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,7,17]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_2_1_1_1","DOI":"10.1109\/HPCASIA.2005.75"},{"doi-asserted-by":"publisher","key":"e_1_2_1_2_1","DOI":"10.1147\/JRD.2019.2947013"},{"key":"e_1_2_1_3_1","volume-title":"MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems. Simul. Model. Pract. Theor. 77 (Sep.","author":"Czarnul Pawe\u0142","year":"2017","unstructured":"Pawe\u0142 Czarnul , Jaros\u0142aw Kuchta , Mariusz Matuszek , Jerzy Proficz , Pawe\u0142 Ro\u015bciszewski , Micha\u0142 W\u00f3jcik , and Julian Szyma\u0144ski . 2017 . MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems. Simul. Model. Pract. Theor. 77 (Sep. 2017), 124\u2013140. DOI:https:\/\/doi.org\/10.1016\/j.simpat.2017.05.009 10.1016\/j.simpat.2017.05.009 Pawe\u0142 Czarnul, Jaros\u0142aw Kuchta, Mariusz Matuszek, Jerzy Proficz, Pawe\u0142 Ro\u015bciszewski, Micha\u0142 W\u00f3jcik, and Julian Szyma\u0144ski. 2017. MERPSYS: An environment for simulation of parallel application execution on large scale HPC systems. Simul. Model. Pract. Theor. 77 (Sep. 2017), 124\u2013140. DOI:https:\/\/doi.org\/10.1016\/j.simpat.2017.05.009"},{"doi-asserted-by":"publisher","key":"e_1_2_1_4_1","DOI":"10.1007\/s10766-008-0070-9"},{"unstructured":"InfiniBand. [n.d.]. InfiniBand Trade Assosciation. Retrieved on 2021 from https:\/\/www.infinibandta.org\/.  InfiniBand. [n.d.]. InfiniBand Trade Assosciation. Retrieved on 2021 from https:\/\/www.infinibandta.org\/.","key":"e_1_2_1_5_1"},{"unstructured":"Intel-MPI-benchmark. [n.d.]. Intel MPI benchmark. Retrieved on 2021 from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/intel-mpi-benchmarks.html.  Intel-MPI-benchmark. [n.d.]. Intel MPI benchmark. Retrieved on 2021 from https:\/\/software.intel.com\/content\/www\/us\/en\/develop\/articles\/intel-mpi-benchmarks.html.","key":"e_1_2_1_6_1"},{"unstructured":"IntelMPI. [n.d.]. Intel MPI library homepage. Retrieved from https:\/\/software.intel.com\/en-us\/mpi-library.  IntelMPI. [n.d.]. Intel MPI library homepage. Retrieved from https:\/\/software.intel.com\/en-us\/mpi-library.","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","volume-title":"Reda Al-Bahrani, Ankit Agrawal, Alok Choudhary, and Wei-keng Liao.","author":"Kang Qiao","year":"2019","unstructured":"Qiao Kang , Jesper Larsson Tr\u00e4ff , Reda Al-Bahrani, Ankit Agrawal, Alok Choudhary, and Wei-keng Liao. 2019 . Scalable algorithms for MPI intergroup allgather and allgatherv. Parallel Comput . 85 (July 2019), 220\u2013230. DOI:https:\/\/doi.org\/10.1016\/j.parco.2019.04.015 10.1016\/j.parco.2019.04.015 Qiao Kang, Jesper Larsson Tr\u00e4ff, Reda Al-Bahrani, Ankit Agrawal, Alok Choudhary, and Wei-keng Liao. 2019. Scalable algorithms for MPI intergroup allgather and allgatherv. Parallel Comput. 85 (July 2019), 220\u2013230. DOI:https:\/\/doi.org\/10.1016\/j.parco.2019.04.015"},{"key":"e_1_2_1_9_1","first-page":"3","article-title":"Tryton supercomputer capabilities for analysis of massive data streams","volume":"22","author":"Krawczyk Henryk","year":"2015","unstructured":"Henryk Krawczyk , Micha\u0142 Nykiel , and Jerzy Proficz . 2015 . Tryton supercomputer capabilities for analysis of massive data streams . Polish Marit. Res. 22 , 3 (Jan. 2015), 99\u2013104. DOI:https:\/\/doi.org\/10.1515\/pomr-2015-0062 10.1515\/pomr-2015-0062 Henryk Krawczyk, Micha\u0142 Nykiel, and Jerzy Proficz. 2015. Tryton supercomputer capabilities for analysis of massive data streams. Polish Marit. Res. 22, 3 (Jan. 2015), 99\u2013104. DOI:https:\/\/doi.org\/10.1515\/pomr-2015-0062","journal-title":"Polish Marit. Res."},{"doi-asserted-by":"publisher","key":"e_1_2_1_10_1","DOI":"10.1109\/HPCS48598.2019.9188149"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the IEEE International Conference on Cluster Computing. IEEE, 135\u2013144","author":"Mamidala A. R.","year":"2004","unstructured":"A. R. Mamidala , Jiuxing Liu , and D. K. Panda . 2004. Efficient barrier and allreduce on infiniband clusters using multicast and adaptive algorithms . In Proceedings of the IEEE International Conference on Cluster Computing. IEEE, 135\u2013144 . DOI:https:\/\/doi.org\/10.1109\/CLUSTR. 2004 .1392611 10.1109\/CLUSTR.2004.1392611 A. R. Mamidala, Jiuxing Liu, and D. K. Panda. 2004. Efficient barrier and allreduce on infiniband clusters using multicast and adaptive algorithms. In Proceedings of the IEEE International Conference on Cluster Computing. IEEE, 135\u2013144. DOI:https:\/\/doi.org\/10.1109\/CLUSTR.2004.1392611"},{"key":"e_1_2_1_12_1","series-title":"Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).","volume-title":"An investigation into the performance of reduction algorithms under load imbalance","author":"Marendi\u0107 Petar","unstructured":"Petar Marendi\u0107 , Jan Lemeire , Tom Haber , Dean Vu\u010dini\u0107 , and Peter Schelkens . 2012. An investigation into the performance of reduction algorithms under load imbalance . In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7484 LNCS. Springer Berlin , 439\u2013450. DOI:https:\/\/doi.org\/10.1007\/978-3-642-32820-6_44 10.1007\/978-3-642-32820-6_44 Petar Marendi\u0107, Jan Lemeire, Tom Haber, Dean Vu\u010dini\u0107, and Peter Schelkens. 2012. An investigation into the performance of reduction algorithms under load imbalance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7484 LNCS. Springer Berlin, 439\u2013450. DOI:https:\/\/doi.org\/10.1007\/978-3-642-32820-6_44"},{"doi-asserted-by":"crossref","unstructured":"P. Marendic J. Lemeire D. Vucinic and P. Schelkens. 2016. A novel MPI reduction algorithm resilient to imbalances in process arrival times. J. Supercomput. (2016). DOI:https:\/\/doi.org\/10.1007\/s11227-016-1707-x 10.1007\/s11227-016-1707-x","key":"#cr-split#-e_1_2_1_13_1.1","DOI":"10.1007\/s11227-016-1707-x"},{"doi-asserted-by":"crossref","unstructured":"P. Marendic J. Lemeire D. Vucinic and P. Schelkens. 2016. A novel MPI reduction algorithm resilient to imbalances in process arrival times. J. Supercomput. (2016). DOI:https:\/\/doi.org\/10.1007\/s11227-016-1707-x","key":"#cr-split#-e_1_2_1_13_1.2","DOI":"10.1007\/s11227-016-1707-x"},{"unstructured":"MPI. [n.d.]. The standarization forum for Messsage Passing Interface (MPI). Retrieved from http:\/\/mpi-forum.org\/.  MPI. [n.d.]. The standarization forum for Messsage Passing Interface (MPI). Retrieved from http:\/\/mpi-forum.org\/.","key":"e_1_2_1_14_1"},{"unstructured":"MPICH. [n.d.]. MPICH High-Performance Portable MPI. Retrieved on 2021 from https:\/\/www.mpich.org\/.  MPICH. [n.d.]. MPICH High-Performance Portable MPI. Retrieved on 2021 from https:\/\/www.mpich.org\/.","key":"e_1_2_1_15_1"},{"unstructured":"OpenMP. [n.d.]. The OpenMP API specification for parallel programming. Retrieved on 2021 from https:\/\/www.openmp.org\/.  OpenMP. [n.d.]. The OpenMP API specification for parallel programming. Retrieved on 2021 from https:\/\/www.openmp.org\/.","key":"e_1_2_1_16_1"},{"unstructured":"OpenMPI. [n.d.]. Open MPI: Open Source High Performance Computing. Retrieved on 2021 from https:\/\/www.open-mpi.org\/.  OpenMPI. [n.d.]. Open MPI: Open Source High Performance Computing. Retrieved on 2021 from https:\/\/www.open-mpi.org\/.","key":"e_1_2_1_17_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_18_1","DOI":"10.1109\/IPDPS.2008.4536308"},{"unstructured":"POSIX threads. [n.d.]. POSIX threads programming. Retrieved on 2021 from https:\/\/computing.llnl.gov\/tutorials\/pthreads\/.  POSIX threads. [n.d.]. POSIX threads programming. Retrieved on 2021 from https:\/\/computing.llnl.gov\/tutorials\/pthreads\/.","key":"e_1_2_1_19_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_20_1","DOI":"10.1007\/s11227-018-2356-z"},{"doi-asserted-by":"publisher","key":"e_1_2_1_21_1","DOI":"10.1007\/s10586-019-03040-x"},{"key":"e_1_2_1_22_1","series-title":"Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).","volume-title":"Performance and power-aware modeling of MPI Applications for cluster computing","author":"Proficz Jerzy","unstructured":"Jerzy Proficz and Pawe\u0142 Czarnul . 2016. Performance and power-aware modeling of MPI Applications for cluster computing . In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9574 . Springer , Cham , 199\u2013209. DOI:https:\/\/doi.org\/10.1007\/978-3-319-32152-3_19 10.1007\/978-3-319-32152-3_19 Jerzy Proficz and Pawe\u0142 Czarnul. 2016. Performance and power-aware modeling of MPI Applications for cluster computing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9574. Springer, Cham, 199\u2013209. DOI:https:\/\/doi.org\/10.1007\/978-3-319-32152-3_19"},{"key":"e_1_2_1_23_1","volume-title":"Ocetkiewicz","author":"Proficz Jerzy","year":"2020","unstructured":"Jerzy Proficz and Krzysztof M . Ocetkiewicz . 2020 . Improving Clairvoyant: reduction algorithm resilient to imbalanced process arrival patterns. J. Supercomput . (Nov. 2020). DOI:https:\/\/doi.org\/10.1007\/s11227-020-03499-1 10.1007\/s11227-020-03499-1 Jerzy Proficz and Krzysztof M. Ocetkiewicz. 2020. Improving Clairvoyant: reduction algorithm resilient to imbalanced process arrival patterns. J. Supercomput. (Nov. 2020). DOI:https:\/\/doi.org\/10.1007\/s11227-020-03499-1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_24_1","DOI":"10.1007\/978-3-030-44041-1_72"},{"doi-asserted-by":"publisher","key":"e_1_2_1_25_1","DOI":"10.1007\/s10586-008-0065-8"},{"doi-asserted-by":"publisher","key":"e_1_2_1_26_1","DOI":"10.1007\/s10766-010-0152-3"},{"unstructured":"SLURM.[n.d.]. SLURM Workload Manager. Retrieved on 2021 from https:\/\/slurm.schedmd.com\/.  SLURM.[n.d.]. SLURM Workload Manager. Retrieved on 2021 from https:\/\/slurm.schedmd.com\/.","key":"e_1_2_1_27_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_28_1","DOI":"10.1177\/1094342005051521"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.1145\/3339186.3339199"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460122","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460122","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:18Z","timestamp":1750188618000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460122"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,17]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3460122"],"URL":"https:\/\/doi.org\/10.1145\/3460122","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2021,7,17]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}