{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T12:37:10Z","timestamp":1774874230857,"version":"3.50.1"},"reference-count":9,"publisher":"World Scientific Pub Co Pte Lt","issue":"01","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Parallel Process. Lett."],"published-print":{"date-parts":[[2011,3]]},"abstract":"<jats:p> Petascale parallel computers with more than a million processing cores are expected to be available in a couple of years. Although MPI is the dominant programming interface today for large-scale systems that at the highest end already have close to 300,000 processors, a challenging question to both researchers and users is whether MPI will scale to processor and core counts in the millions. In this paper, we examine the issue of scalability of MPI to very large systems. We first examine the MPI specification itself and discuss areas with scalability concerns and how they can be overcome. We then investigate issues that an MPI implementation must address in order to be scalable. To illustrate the issues, we ran a number of simple experiments to measure MPI memory consumption at scale up to 131,072 processes, or 80%, of the IBM Blue Gene\/P system at Argonne National Laboratory. Based on the results, we identified nonscalable aspects of the MPI implementation and found ways to tune it to reduce its memory footprint. We also briefly discuss issues in application scalability to large process counts and features of MPI that enable the use of other techniques to alleviate scalability limitations in applications. <\/jats:p>","DOI":"10.1142\/s0129626411000060","type":"journal-article","created":{"date-parts":[[2011,3,24]],"date-time":"2011-03-24T01:21:22Z","timestamp":1300929682000},"page":"45-60","source":"Crossref","is-referenced-by-count":44,"title":["MPI ON MILLIONS OF CORES"],"prefix":"10.1142","volume":"21","author":[{"given":"PAVAN","family":"BALAJI","sequence":"first","affiliation":[{"name":"Argonne National Laboratory, Argonne, IL 60439, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"DARIUS","family":"BUNTINAS","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory, Argonne, IL 60439, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"DAVID","family":"GOODELL","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory, Argonne, IL 60439, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"WILLIAM","family":"GROPP","sequence":"additional","affiliation":[{"name":"University of Illinois, Urbana, IL 61801, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"TORSTEN","family":"HOEFLER","sequence":"additional","affiliation":[{"name":"University of Illinois, Urbana, IL 61801, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"SAMEER","family":"KUMAR","sequence":"additional","affiliation":[{"name":"IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"EWING","family":"LUSK","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory, Argonne, IL 60439, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"RAJEEV","family":"THAKUR","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory, Argonne, IL 60439, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"JESPER LARSSON","family":"TR\u00c4FF","sequence":"additional","affiliation":[{"name":"Dept. of Scientific Computing, Univ. of Vienna, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"rf2","doi-asserted-by":"publisher","DOI":"10.1007\/s00450-009-0095-3"},{"key":"rf7","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1206"},{"key":"rf8","volume-title":"Using OpenMP: Portable Shared Memory Parallel Programming","author":"Chapman B.","year":"2007"},{"key":"rf9","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539794262161"},{"key":"rf10","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(01)00100-4"},{"key":"rf15","volume-title":"Using MPI: Portable Parallel Programming with the Message-Passing Interface","author":"Gropp W.","year":"1999"},{"key":"rf16","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004046045"},{"key":"rf36","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.nucl.51.101701.132506"},{"key":"rf42","doi-asserted-by":"publisher","DOI":"10.1177\/1094342005051521"}],"container-title":["Parallel Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0129626411000060","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,6]],"date-time":"2019-08-06T16:18:07Z","timestamp":1565108287000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0129626411000060"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,3]]},"references-count":9,"journal-issue":{"issue":"01","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[2011,3]]}},"alternative-id":["10.1142\/S0129626411000060"],"URL":"https:\/\/doi.org\/10.1142\/s0129626411000060","relation":{},"ISSN":["0129-6264","1793-642X"],"issn-type":[{"value":"0129-6264","type":"print"},{"value":"1793-642X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,3]]}}}