{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T16:25:46Z","timestamp":1776529546748,"version":"3.51.2"},"reference-count":35,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2014,11,1]],"date-time":"2014-11-01T00:00:00Z","timestamp":1414800000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2014,11]]},"abstract":"<jats:p> The Blue Gene\/Q (BG\/Q) machine is the latest in the line of IBM massively parallel supercomputers, designed to scale to 262,144 nodes and 16 million threads. Each BG\/Q node has 68 hardware threads. Hybrid programming paradigms, which use message passing among nodes and multi-threading within nodes, enable applications to achieve high throughput on BG\/Q. In this paper, we present scalable algorithms to optimize MPI collective operations by taking advantage of the various features of the BG\/Q torus and collective networks. We achieve an 8 byte double-sum MPI_Allreduce latency of 10.25\u2009ms on 1,572,864\u2009MPI ranks. We accelerate summing of network packets with local buffers by the use of the Quad Processing SIMD unit in the BG\/Q cores and executing the sums on multiple communication threads supported by the optimized communication libraries. The achieved net gain is a peak throughput of 6.3\u2009GB\/s for double-sum allreduce. We also achieve over 90% of network peak for MPI_Alltoall with 65,536\u2009MPI ranks. <\/jats:p>","DOI":"10.1177\/1094342014552086","type":"journal-article","created":{"date-parts":[[2014,11,7]],"date-time":"2014-11-07T09:20:18Z","timestamp":1415352018000},"page":"450-464","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":24,"title":["Optimization of MPI collective operations on the IBM Blue Gene\/Q supercomputer"],"prefix":"10.1177","volume":"28","author":[{"given":"Sameer","family":"Kumar","sequence":"first","affiliation":[{"name":"IBM India Research Center, Nagawara, Bangalore, India"}]},{"given":"Amith","family":"Mamidala","sequence":"additional","affiliation":[{"name":"IBM India Research Center, Nagawara, Bangalore, India"}]},{"given":"Philip","family":"Heidelberger","sequence":"additional","affiliation":[{"name":"IBM India Research Center, Nagawara, Bangalore, India"}]},{"given":"Dong","family":"Chen","sequence":"additional","affiliation":[{"name":"IBM India Research Center, Nagawara, Bangalore, India"}]},{"given":"Daniel","family":"Faraj","sequence":"additional","affiliation":[{"name":"Intel Technical Computing Group, Edina, MN, USA"}]}],"member":"179","published-online":{"date-parts":[[2014,11,7]]},"reference":[{"key":"bibr1-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088183"},{"key":"bibr2-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1109\/71.963419"},{"key":"bibr3-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2006.31"},{"key":"bibr4-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2006.31"},{"key":"bibr5-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063419"},{"key":"bibr6-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.72"},{"key":"bibr7-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2011.96"},{"key":"bibr8-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/1274971.1274996"},{"key":"bibr9-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/1183401.1183431"},{"key":"bibr10-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1147\/rd.492.0195"},{"key":"bibr11-1094342014552086","unstructured":"Gropp W, Lusk E (1995) MPICH ADI implementation reference manual."},{"key":"bibr12-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8191(96)00024-5"},{"key":"bibr13-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1007\/BF01379320"},{"key":"bibr14-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362692"},{"issue":"1","key":"bibr15-1094342014552086","first-page":"199","volume":"52","author":"IBM Blue Gene Team","year":"2008","journal-title":"IBM Journal of Research and Development"},{"key":"bibr16-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1147\/JRD.2012.2222991"},{"key":"bibr17-1094342014552086","doi-asserted-by":"crossref","first-page":"175","DOI":"10.7551\/mitpress\/5241.003.0009","volume-title":"Parallel programming using C++","author":"Kale LV","year":"1996"},{"key":"bibr18-1094342014552086","volume-title":"Internationall conference on parallel processing (ICPP\u201913)","author":"Kandalla KC","year":"2013"},{"key":"bibr19-1094342014552086","volume-title":"International parallel and distributed processing symposium (IPDPS\u201912)","author":"Kandalla K","year":"2012"},{"key":"bibr20-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/2488551.2488557"},{"key":"bibr21-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87475-1_10"},{"key":"bibr22-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/1375527.1375544"},{"key":"bibr23-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009359011"},{"key":"bibr24-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2012.73"},{"key":"bibr25-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2008.83"},{"key":"bibr26-1094342014552086","unstructured":"Lawrence Livermore National Laboratory (2012) Advanced simulation and computing (ASC) Sequoia: sustainable stockpile stewardship. Available at: https:\/\/asc.llnl.gov\/computing_resources\/sequoia\/ (accessed ?)."},{"key":"bibr27-1094342014552086","unstructured":"Lawrence Livermore National Laboratory (2013) ASC Sequoia benchmark codes. Available at: https:\/\/asc.llnl.gov\/sequoia\/benchmarks\/ (accessed ?)."},{"key":"bibr28-1094342014552086","volume-title":"Proceedings of the international conference on Supercomputing (ICS 2012)","author":"Mittal A","year":"2012"},{"key":"bibr29-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0097937"},{"key":"bibr30-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/1345206.1345224"},{"issue":"1","key":"bibr31-1094342014552086","first-page":"55","volume":"57","author":"Ryu KD","year":"2013","journal-title":"IBM Journal of Research and Development"},{"key":"bibr32-1094342014552086","volume-title":"High-performance computing in the Asia-Pacific region","author":"Takahashi D","year":"2000"},{"key":"bibr33-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304605"},{"key":"bibr34-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1177\/1094342005051521"},{"key":"bibr35-1094342014552086","doi-asserted-by":"publisher","DOI":"10.1145\/1188455.1188507"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014552086","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342014552086","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342014552086","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T05:38:42Z","timestamp":1741066722000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342014552086"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,11]]},"references-count":35,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,11]]}},"alternative-id":["10.1177\/1094342014552086"],"URL":"https:\/\/doi.org\/10.1177\/1094342014552086","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,11]]}}}