{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T10:45:58Z","timestamp":1760784358396,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,11,12]],"date-time":"2017-11-12T00:00:00Z","timestamp":1510444800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100006192","name":"Advanced Scientific Computing Research","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006192","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,11,12]]},"DOI":"10.1145\/3126908.3126963","type":"proceedings-article","created":{"date-parts":[[2017,11,8]],"date-time":"2017-11-08T21:02:30Z","timestamp":1510174950000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":22,"title":["Why is MPI so slow?"],"prefix":"10.1145","author":[{"given":"Ken","family":"Raffenetti","sequence":"first","affiliation":[{"name":"Argonne National Laboratory"}]},{"given":"Abdelhalim","family":"Amer","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory"}]},{"given":"Lena","family":"Oden","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory"}]},{"given":"Charles","family":"Archer","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Wesley","family":"Bland","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Hajime","family":"Fujita","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Yanfei","family":"Guo","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory"}]},{"given":"Tomislav","family":"Janjusic","sequence":"additional","affiliation":[{"name":"Mellanox Technologies"}]},{"given":"Dmitry","family":"Durnov","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Michael","family":"Blocksome","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Min","family":"Si","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory"}]},{"given":"Sangmin","family":"Seo","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory"}]},{"given":"Akhil","family":"Langer","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Gengbin","family":"Zheng","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Masamichi","family":"Takagi","sequence":"additional","affiliation":[{"name":"RIKEN Advanced Institute of Computational Science"}]},{"given":"Paul","family":"Coffman","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory"}]},{"given":"Jithin","family":"Jose","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Sayantan","family":"Sur","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Alexander","family":"Sannikov","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Sergey","family":"Oblomov","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Michael","family":"Chuvelev","sequence":"additional","affiliation":[{"name":"Intel Corporation"}]},{"given":"Masayuki","family":"Hatanaka","sequence":"additional","affiliation":[{"name":"RIKEN Advanced Institute of Computational Science"}]},{"given":"Xin","family":"Zhao","sequence":"additional","affiliation":[{"name":"Mellanox Technologies"}]},{"given":"Paul","family":"Fischer","sequence":"additional","affiliation":[{"name":"University of Illinois"}]},{"given":"Thilina","family":"Rathnayake","sequence":"additional","affiliation":[{"name":"University of Illinois"}]},{"given":"Matt","family":"Otten","sequence":"additional","affiliation":[{"name":"Cornell University"}]},{"given":"Misun","family":"Min","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory"}]},{"given":"Pavan","family":"Balaji","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory"}]}],"member":"320","published-online":{"date-parts":[[2017,11,12]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2017. Center for Exascale Simulation of Advanced Reactors. https:\/\/cesar.mcs.anl.gov. (2017).  2017. Center for Exascale Simulation of Advanced Reactors. https:\/\/cesar.mcs.anl.gov. (2017)."},{"key":"e_1_3_2_1_2_1","unstructured":"2017. Center for Exascale Simulation of Combustion in Turbulence. https:\/\/science.energy.gov\/ascr\/research\/scidac\/co-design\/. (2017).  2017. Center for Exascale Simulation of Combustion in Turbulence. https:\/\/science.energy.gov\/ascr\/research\/scidac\/co-design\/. (2017)."},{"key":"e_1_3_2_1_3_1","volume-title":"https:\/\/asc.llnl.gov\/CORAL-benchmarks","author":"Benchmarks CORAL","year":"2017","unstructured":"2017,. CORAL Benchmarks . https:\/\/asc.llnl.gov\/CORAL-benchmarks . ( 2017 ,). 2017,. CORAL Benchmarks. https:\/\/asc.llnl.gov\/CORAL-benchmarks. (2017,)."},{"key":"e_1_3_2_1_4_1","unstructured":"2017. Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH). https:\/\/codesign.llnl.gov\/lulesh.php. (2017).  2017. Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH). https:\/\/codesign.llnl.gov\/lulesh.php. (2017)."},{"key":"e_1_3_2_1_5_1","unstructured":"2017. Monte Carlo Benchmark (MCB). https:\/\/codesign.llnl.gov\/mcb.php. (2017).  2017. Monte Carlo Benchmark (MCB). https:\/\/codesign.llnl.gov\/mcb.php. (2017)."},{"key":"e_1_3_2_1_6_1","unstructured":"2017. NAS Parallel Benchmarks. http:\/\/www.nas.nasa.gov\/publications\/npb.html. (2017).  2017. NAS Parallel Benchmarks. http:\/\/www.nas.nasa.gov\/publications\/npb.html. (2017)."},{"key":"e_1_3_2_1_7_1","unstructured":"2017. Nekbone. https:\/\/cesar.mcs.anl.gov\/content\/software\/thermal_hydraulics. (2017).  2017. Nekbone. https:\/\/cesar.mcs.anl.gov\/content\/software\/thermal_hydraulics. (2017)."},{"key":"e_1_3_2_1_8_1","unstructured":"2017. QMCPack. http:\/\/qmcpack.org. (2017).  2017. QMCPack. http:\/\/qmcpack.org. (2017)."},{"key":"e_1_3_2_1_9_1","unstructured":"2017. The Local-Self-Consistent Mutliple-Scattering (LSMSL) Code. https:\/\/www.ccs.ornl.gov\/mri\/repository\/LSMS\/index.html. (2017).  2017. The Local-Self-Consistent Mutliple-Scattering (LSMSL) Code. https:\/\/www.ccs.ornl.gov\/mri\/repository\/LSMS\/index.html. (2017)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2006.56"},{"key":"e_1_3_2_1_11_1","unstructured":"Abdelhalim Amer Pavan Balaji Wesley Bland William Gropp Rob Latham Huiwei Lu Lena Oden Antonio Pena Ken Raffenetti Sangmin Seo etal 2015. MPICH User's Guide. (2015).  Abdelhalim Amer Pavan Balaji Wesley Bland William Gropp Rob Latham Huiwei Lu Lena Oden Antonio Pena Ken Raffenetti Sangmin Seo et al. 2015. MPICH User's Guide. (2015)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009360206"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Brian W Barrett Ron Brightwell Ryan Grant Simon D Hammond and K Scott Hemmert. 2014. An evaluation of MPI message rate on hybrid-core processors. (2014).  Brian W Barrett Ron Brightwell Ryan Grant Simon D Hammond and K Scott Hemmert. 2014. An evaluation of MPI message rate on hybrid-core processors. (2014).","DOI":"10.1177\/1094342014552085"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/11846802_36"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488551.2488553"},{"volume-title":"22nd AIAA Computational Fluid Dynamics Conference, AIAA Aviation. AIAA 2015--3049","author":"Fischer P.","key":"e_1_3_2_1_16_1","unstructured":"P. Fischer , K. Heisey , and M. Min . 2015. Scaling Limits for PDE-Based Simulation (Invited) . In 22nd AIAA Computational Fluid Dynamics Conference, AIAA Aviation. AIAA 2015--3049 . P. Fischer, K. Heisey, and M. Min. 2015. Scaling Limits for PDE-Based Simulation (Invited). In 22nd AIAA Computational Fluid Dynamics Conference, AIAA Aviation. AIAA 2015--3049."},{"key":"e_1_3_2_1_17_1","unstructured":"P. Fischer J. Lottes and S. Kerkemeier. 2008. Nek5000: Open source spectral element CFD solver. http:\/\/nek5000.mcs.anl.gov and https:\/\/github.com\/Nek5000\/nek5000. (2008).  P. Fischer J. Lottes and S. Kerkemeier. 2008. Nek5000: Open source spectral element CFD solver. http:\/\/nek5000.mcs.anl.gov and https:\/\/github.com\/Nek5000\/nek5000. (2008)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/0021-9991(91)90216-8"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-41321-1_15"},{"volume-title":"Using Advanced MPI: Modern Features of the Message-Passing Interface","author":"Gropp William","key":"e_1_3_2_1_20_1","unstructured":"William Gropp , Torsten Hoefler , Rajeev Thakur , and Ewing Lusk . 2004. Using Advanced MPI: Modern Features of the Message-Passing Interface . MIT Press . William Gropp, Torsten Hoefler, Rajeev Thakur, and Ewing Lusk. 2004. Using Advanced MPI: Modern Features of the Message-Passing Interface. MIT Press."},{"volume-title":"Using MPI-2: Advanced Features of the Message-Passing Interface","author":"Gropp William","key":"e_1_3_2_1_21_1","unstructured":"William Gropp , Ewing Lusk , and Rajeev Thakur . 1999. Using MPI-2: Advanced Features of the Message-Passing Interface . MIT Press . William Gropp, Ewing Lusk, and Rajeev Thakur. 1999. Using MPI-2: Advanced Features of the Message-Passing Interface. MIT Press."},{"key":"e_1_3_2_1_22_1","volume-title":"Memory Compression Techniques for Network Address Management in MPI. In IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Guo Yanfei","year":"2017","unstructured":"Yanfei Guo , Charles Archer , Michael Blocksome , Scott Parker , Wesley Bland , Kenneth J. Raffenetti , and Pavan Balaji . 2017 . Memory Compression Techniques for Network Address Management in MPI. In IEEE International Parallel and Distributed Processing Symposium (IPDPS) . Orlando, Florida. Yanfei Guo, Charles Archer, Michael Blocksome, Scott Parker, Wesley Bland, Kenneth J. Raffenetti, and Pavan Balaji. 2017. Memory Compression Techniques for Network Address Management in MPI. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). Orlando, Florida."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2504566"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2780584"},{"key":"e_1_3_2_1_26_1","volume-title":"MPI: A Message Passing Interface Standard.","author":"Forum MPI","year":"2015","unstructured":"MPI Forum . 2015 . MPI: A Message Passing Interface Standard. (2015). http:\/\/www.mpi-forum.org\/docs\/docs.html. MPI Forum. 2015. MPI: A Message Passing Interface Standard. (2015). http:\/\/www.mpi-forum.org\/docs\/docs.html."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"M. Otten J. Gong A. Mametjanov A. Vose J. Levesque P. Fischer and M. Min. 2016. An MPI\/OpenACC Implementation of a High Order Electromagnetics Solver with GPUDirect Communication. Int. J. High Perf. Comput. Appl. (2016).  M. Otten J. Gong A. Mametjanov A. Vose J. Levesque P. Fischer and M. Min. 2016. An MPI\/OpenACC Implementation of a High Order Electromagnetics Solver with GPUDirect Communication. Int. J. High Perf. Comput. Appl. (2016).","DOI":"10.1177\/1094342015626584"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCS.2008.10"},{"volume-title":"Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on. IEEE, 412--419","author":"Xian-He","key":"e_1_3_2_1_29_1","unstructured":"Xian-He Sun et al. 2003. Improving the performance of MPI derived datatypes by optimizing memory-access cost . In Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on. IEEE, 412--419 . Xian-He Sun et al. 2003. Improving the performance of MPI derived datatypes by optimizing memory-access cost. In Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on. IEEE, 412--419."},{"volume-title":"European Parallel Virtual Machine\/Message Passing Interface Users\u00e2\u0102&Zacute","author":"Thakur Rajeev","key":"e_1_3_2_1_30_1","unstructured":"Rajeev Thakur and William D Gropp . 2003. Improving the performance of collective operations in MPICH . In European Parallel Virtual Machine\/Message Passing Interface Users\u00e2\u0102&Zacute ; Group Meeting. Springer , 257--267. Rajeev Thakur and William D Gropp. 2003. Improving the performance of collective operations in MPICH. In European Parallel Virtual Machine\/Message Passing Interface Users\u00e2\u0102&Zacute; Group Meeting. Springer, 257--267."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331599"},{"key":"e_1_3_2_1_32_1","volume-title":"European MPI Users' Group Meeting. Springer, 208--217","author":"Compr\u00e9s Ure\u00f1a Isa\u00edas A","year":"2011","unstructured":"Isa\u00edas A Compr\u00e9s Ure\u00f1a , Michael Riepen , and Michael Konow . 2011 . RCKMPI-lightweight MPI implementation for Intel\u00e2\u0102&Zacute;s Single-chip Cloud Computer (SCC) . In European MPI Users' Group Meeting. Springer, 208--217 . Isa\u00edas A Compr\u00e9s Ure\u00f1a, Michael Riepen, and Michael Konow. 2011. RCKMPI-lightweight MPI implementation for Intel\u00e2\u0102&Zacute;s Single-chip Cloud Computer (SCC). In European MPI Users' Group Meeting. Springer, 208--217."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2010.04.018"}],"event":{"name":"SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis","sponsor":["SIGHPC ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing","IEEE CS"],"location":"Denver Colorado","acronym":"SC '17"},"container-title":["Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3126908.3126963","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3126908.3126963","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3126908.3126963","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:11:09Z","timestamp":1750212669000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3126908.3126963"}},"subtitle":["analyzing the fundamental limits in implementing MPI-3.1"],"short-title":[],"issued":{"date-parts":[[2017,11,12]]},"references-count":32,"alternative-id":["10.1145\/3126908.3126963","10.1145\/3126908"],"URL":"https:\/\/doi.org\/10.1145\/3126908.3126963","relation":{},"subject":[],"published":{"date-parts":[[2017,11,12]]},"assertion":[{"value":"2017-11-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}