{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T08:33:28Z","timestamp":1777106008654,"version":"3.51.4"},"publisher-location":"Cham","reference-count":16,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783030507428","type":"print"},{"value":"9783030507435","type":"electronic"}],"license":[{"start":{"date-parts":[[2020,1,1]],"date-time":"2020-01-01T00:00:00Z","timestamp":1577836800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,6,15]],"date-time":"2020-06-15T00:00:00Z","timestamp":1592179200000},"content-version":"vor","delay-in-days":166,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>\n\nThis paper describes the new hardware-based streaming-aggregation capability added to Mellanox\u2019s Scalable Hierarchical Aggregation and Reduction Protocol in its HDR InfiniBand switches. For large messages, this capability is designed to achieve reduction bandwidths similar to those of point-to-point messages of the same size, and complements the latency-optimized low-latency aggregation reduction capabilities, aimed at small data reductions. <jats:italic>MPI_Allreduce()<\/jats:italic> bandwidth measured on an HDR InfiniBand based system achieves about 95% of network bandwidth. For medium and large data reduction this also improves the reduction bandwidth by a factor of 2\u20135 relative to host-based (e.g., software-based) reduction algorithms. Using this capability also increased DL-Poly and PyTorch application performance by as much as 4% and 18%, respectively. This paper describes SHARP Streaming-Aggregation hardware architecture and a set of synthetic and application benchmarks used to study this new reduction capability, and the range of data sizes for which Streaming-Aggregation performs better than the low-latency aggregation algorithm.<\/jats:p>","DOI":"10.1007\/978-3-030-50743-5_3","type":"book-chapter","created":{"date-parts":[[2020,6,15]],"date-time":"2020-06-15T19:03:45Z","timestamp":1592247825000},"page":"41-59","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["Scalable Hierarchical Aggregation and\u00a0Reduction Protocol (SHARP)TM Streaming-Aggregation Hardware Design and Evaluation"],"prefix":"10.1007","author":[{"given":"Richard L.","family":"Graham","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lion","family":"Levi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Devendar","family":"Burredy","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gil","family":"Bloch","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gilad","family":"Shainer","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Cho","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"George","family":"Elias","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Klein","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joshua","family":"Ladd","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ophir","family":"Maor","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ami","family":"Marelli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Valentin","family":"Petrov","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Evyatar","family":"Romlet","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yong","family":"Qin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ido","family":"Zemah","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,6,15]]},"reference":[{"key":"3_CR1","unstructured":"http:\/\/www.mpi-forum.org"},{"key":"3_CR2","unstructured":"http:\/\/www.openshmem.org"},{"key":"3_CR3","unstructured":"http:\/\/www.mellanox.com\/page\/products_dyn?product_family=189&mtag=hpc-x"},{"key":"3_CR4","unstructured":"http:\/\/mvapich.cse.ohio-state.edu\/benchmarks\/"},{"key":"3_CR5","unstructured":"https:\/\/github.com\/NVIDIA\/nccl"},{"key":"3_CR6","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1007\/s00450-012-0211-7","volume":"28","author":"T Adachi","year":"2006","unstructured":"Adachi, T., et al.: The design of ultra scalable MPI collective communication on the K computer. Comput. Sci. Res. Dev. 28, 147\u2013155 (2006). https:\/\/doi.org\/10.1007\/s00450-012-0211-7","journal-title":"Comput. Sci. Res. Dev."},{"key":"3_CR7","unstructured":"Elias, G., Levi, L., Romlet, E., Marelli, A.: Parallel computation network device. US Patent 16\/357,356. Filed 19 March 2019"},{"key":"3_CR8","doi-asserted-by":"crossref","unstructured":"Gao, S., Schmidt, A.G., Sass, R.: Impact of reconfigurable hardware on accelerating MPI$$\\_$$Reduce. In: 2010 International Conference on Field-Programmable Technology, pp. 29\u201336 (2010)","DOI":"10.1109\/FPT.2010.5681537"},{"key":"3_CR9","doi-asserted-by":"crossref","unstructured":"Graham, R., et al.: Scalable hierarchical aggregation protocol (SHArP): a hardware architecture for efficient data reduction. In: 2016 First International Workshop on Communication Optimizations in HPC (COMHPC), COM-HPC 2016, pp. 1\u201310, November 2016","DOI":"10.1109\/COMHPC.2016.006"},{"key":"3_CR10","doi-asserted-by":"crossref","unstructured":"Graham, R.L., et al.: ConnectX-2 infiniband management queues: first investigation of the new support for network offloaded collective operations. In: Proceedings of the 2010 10th IEEE\/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID 2010, pp. 53\u201362 (2010)","DOI":"10.1109\/CCGRID.2010.9"},{"issue":"4","key":"3_CR11","doi-asserted-by":"publisher","first-page":"450","DOI":"10.1177\/1094342014552086","volume":"28","author":"S Kumar","year":"2014","unstructured":"Kumar, S., Mamidala, A., Heidelberger, P., Chen, D., Faraj, D.: Optimization of MPI collective operations on the IBM blue Gene\/Q supercomputer. Int. J. High Perform. Comput. Appl. 28(4), 450\u2013464 (2014)","journal-title":"Int. J. High Perform. Comput. Appl."},{"key":"3_CR12","unstructured":"Paszke, A., et. al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., Alche-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024\u20138035. Curran Associates, Inc. (2019). http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf"},{"key":"3_CR13","unstructured":"Stern, J.A., Xiong, Q., Skjellum, A.: A novel approach to supporting communicators for in-switch processing of MPI collectives. In: Workshop on Exascale MPI (2019)"},{"key":"3_CR14","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1177\/1094342005051521","volume":"19","author":"R Thakur","year":"2005","unstructured":"Thakur, R., Rabenseifner, R.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 49\u201366 (2005)","journal-title":"Int. J. High Perform. Comput. Appl."},{"key":"3_CR15","doi-asserted-by":"publisher","first-page":"1911","DOI":"10.1039\/b517931a","volume":"16","author":"I Todorov","year":"2006","unstructured":"Todorov, I., Smith, W., Trachenko, K., Dove, M.: J. Mater. Chem. 16, 1911\u20131918 (2006)","journal-title":"J. Mater. Chem."},{"key":"3_CR16","unstructured":"Vaswani, A., et al.: Attention is all you need. CoRR abs\/1706.03762 (2017). https:\/\/arxiv.org\/abs\/1706.03762"}],"container-title":["Lecture Notes in Computer Science","High Performance Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-50743-5_3","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,18]],"date-time":"2023-12-18T20:05:00Z","timestamp":1702929900000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-50743-5_3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020]]},"ISBN":["9783030507428","9783030507435"],"references-count":16,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-50743-5_3","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020]]},"assertion":[{"value":"15 June 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ISC High Performance","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on High Performance Computing","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Frankfurt am Main","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Germany","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2020","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"22 June 2020","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"25 June 2020","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"35","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"supercomputing2020","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/www.isc-hpc.com\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Linklings","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"87","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"27","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"31% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.73","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"4.33","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"The conference was held virtually due to the COVID-19 pandemic.","order":10,"name":"additional_info_on_review_process","label":"Additional Info on Review Process","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}