{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T18:19:24Z","timestamp":1765995564660,"version":"3.41.0"},"reference-count":20,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>\n            We give optimally fast\n            <jats:italic>O<\/jats:italic>\n            (log\n            <jats:italic>p<\/jats:italic>\n            ) time (per processor) algorithms for computing round-optimal broadcast schedules for message-passing parallel computing systems. This affirmatively answers difficult questions posed in a SPAA 2022 BA and a CLUSTER 2022 paper. We observe that the computed schedules and circulant communication graph can likewise be used for reduction, all-broadcast and all-reduction as well, leading to new, round-optimal algorithms for these problems. These observations affirmatively answer open questions posed in a CLUSTER 2023 paper.\n          <\/jats:p>\n          <jats:p>\n            The problem is to broadcast\n            <jats:italic>n<\/jats:italic>\n            indivisible blocks of data from a given root processor to all other processors in a (subgraph of a) fully connected network of\n            <jats:italic>p<\/jats:italic>\n            processors with fully bidirectional, simultaneous send-receive, one-ported communication capabilities. In this model,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(n-1+\\lceil \\log _2 p\\rceil\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            send-receive communication rounds are required. Our new algorithms work for any number of processors\n            <jats:italic>p<\/jats:italic>\n            , and compute for each processor in the network receive and send schedules each of size\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\lceil \\log _2 p\\rceil\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            that determine uniquely in\n            <jats:italic>O<\/jats:italic>\n            (1) time for each communication round the new block that the processor will receive, and the already received block it has to send. Schedule computations are done independently per processor without communication. The broadcast communication subgraph is an easily computable, directed,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\lceil \\log _2 p\\rceil\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -regular circulant graph also used elsewhere. We show how the schedule computations can be done in optimal time and space of\n            <jats:italic>O<\/jats:italic>\n            (log\n            <jats:italic>p<\/jats:italic>\n            ), improving significantly over previous results of\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(O(p\\log ^2 p)\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            and\n            <jats:italic>O<\/jats:italic>\n            (log\n            <jats:sup>3<\/jats:sup>\n            <jats:italic>p<\/jats:italic>\n            ), respectively, for any number of processors. The schedule computation and broadcast algorithms are simple to implement, but correctness and complexity are not obvious. By dividing a given input of\n            <jats:italic>m<\/jats:italic>\n            Bytes into\n            <jats:italic>n<\/jats:italic>\n            indivisible blocks, the schedules are used for new, best possible pipelined implementations of the MPI (Message-Passing Interface) collectives\n            <jats:sans-serif>MPI_Bcast<\/jats:sans-serif>\n            ,\n            <jats:sans-serif>MPI_Reduce<\/jats:sans-serif>\n            ,\n            <jats:sans-serif>MPI_Allgatherv<\/jats:sans-serif>\n            , and\n            <jats:sans-serif>MPI_Reduce_scatter<\/jats:sans-serif>\n            . Indicative, experimental results are given. The schedules can also be used for replicating large arrays on an EREW PRAM.\n          <\/jats:p>","DOI":"10.1145\/3735139","type":"journal-article","created":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T11:13:13Z","timestamp":1746616393000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, Reduction, All-Broadcast and All-Reduction"],"prefix":"10.1145","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4864-9226","authenticated-orcid":false,"given":"Jesper Larsson","family":"Tr\u00e4ff","sequence":"first","affiliation":[{"name":"Faculty of Informatics, Institute of Computer Engineering, Research Group Parallel Computing, TU Wien, Vienna (Wien), Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,6,14]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/0166-218X(94)90001-9"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626493000046"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0166-218X(99)00155-9"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1995.1018"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/S1383-7621(03)00059-6"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1142\/S012962649300037X"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/71.642949"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1206"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2009.09.002"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/12.29465"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1002\/net.3230260409"},{"key":"e_1_3_3_13_2","volume-title":"MPI: A Message-Passing Interface Standard. Version 4.1","author":"Forum MPI","year":"2023","unstructured":"MPI Forum. 2023. MPI: A Message-Passing Interface Standard. Version 4.1. Retrieved March 25, 2025 from www.mpi-forum.org"},{"key":"e_1_3_3_14_2","first-page":"100","volume-title":"20th International Parallel and Distributed Processing Symposium (IPDPS\u201906)","author":"Ritzdorf Hubert","year":"2006","unstructured":"Hubert Ritzdorf and Jesper Larsson Tr\u00e4ff. 2006. Collective operations in NEC\u2019s high-performance MPI libraries. In 20th International Parallel and Distributed Processing Symposium (IPDPS\u201906). 100."},{"key":"e_1_3_3_15_2","first-page":"143","volume-title":"34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA\u201922)","author":"Tr\u00e4ff Jesper Larsson","year":"2022","unstructured":"Jesper Larsson Tr\u00e4ff. 2022. Brief announcement: Fast(er) construction of round-optimal n-block broadcast schedules. In 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA\u201922). ACM, 143\u2013146."},{"key":"e_1_3_3_16_2","first-page":"142","volume-title":"IEEE International Conference on Cluster Computing (CLUSTER\u201922)","author":"Tr\u00e4ff Jesper Larsson","year":"2022","unstructured":"Jesper Larsson Tr\u00e4ff. 2022. Fast(er) construction of round-optimal n-block broadcast schedules. In IEEE International Conference on Cluster Computing (CLUSTER\u201922). IEEE Computer Society, 142\u2013151."},{"key":"e_1_3_3_17_2","article-title":"Optimal, Non-pipelined Reduce-scatter and Allreduce Algorithms","author":"Tr\u00e4ff Jesper Larsson","year":"2024","unstructured":"Jesper Larsson Tr\u00e4ff. 2024. Optimal, Non-pipelined Reduce-scatter and Allreduce Algorithms. arXiv:2410.14234. Retrieved from https:\/\/arxiv.org\/abs\/2410.14234","journal-title":"arXiv:2410.14234"},{"key":"e_1_3_3_18_2","first-page":"270","volume-title":"IEEE International Conference on Cluster Computing (CLUSTER\u201920)","author":"Tr\u00e4ff Jesper Larsson","year":"2020","unstructured":"Jesper Larsson Tr\u00e4ff and Sascha Hunold. 2020. Decomposing MPI collectives for exploiting multi-lane communication. In IEEE International Conference on Cluster Computing (CLUSTER\u201920). IEEE Computer Society, 270\u2013280."},{"key":"e_1_3_3_19_2","first-page":"284","volume-title":"IEEE International Conference on Cluster Computing (CLUSTER\u201923)","author":"Tr\u00e4ff Jesper Larsson","year":"2023","unstructured":"Jesper Larsson Tr\u00e4ff, Sascha Hunold, Nikolaus Manes Funk, and Ioannis Vardas. 2023. Uniform algorithms for reduce-scatter and (most) other collectives for MPI. In IEEE International Conference on Cluster Computing (CLUSTER\u201923). IEEE Computer Society, 284\u2013294."},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2007.12.001"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009359013"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3735139","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,14]],"date-time":"2025-06-14T12:01:52Z","timestamp":1749902512000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3735139"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,14]]},"references-count":20,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3735139"],"URL":"https:\/\/doi.org\/10.1145\/3735139","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2025,6,14]]},"assertion":[{"value":"2024-07-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-29","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}