{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:23:49Z","timestamp":1759134229518},"reference-count":14,"publisher":"World Scientific Pub Co Pte Ltd","issue":"04","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Parallel Process. Lett."],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:p>The fastest supercomputers today such as Blue Gene\/L, Blue Gene\/P, Cray XT3 and XT4 are connected by a three-dimensional torus\/mesh interconnect. Applications running on these machines can benefit from topology-awareness while mapping tasks to processors at runtime. By co-locating communicating tasks on nearby processors, the distance traveled by messages and hence the communication traffic can be minimized, thereby reducing communication latency and contention on the network. This paper describes preliminary work utilizing this technique and performance improvements resulting from it in the context of a n-dimensional k-point stencil program. It shows that even for simple benchmarks, topology-aware mapping can have a significant impact on performance. Automated topology-aware mapping by the runtime using similar ideas can relieve the application writer from this burden and result in better performance. Preliminary work towards achieving this for a molecular dynamics application, NAMD, is also presented. Results on up to 32,768 processors of IBM's Blue Gene\/L, 4,096 processors of IBM's Blue Gene\/P and 2,048 processors of Cray's XT3 support the ideas discussed in the paper.<\/jats:p>","DOI":"10.1142\/s0129626408003569","type":"journal-article","created":{"date-parts":[[2008,12,4]],"date-time":"2008-12-04T11:40:57Z","timestamp":1228390857000},"page":"549-566","source":"Crossref","is-referenced-by-count":25,"title":["Benefits of Topology Aware Mapping for Mesh Interconnects"],"prefix":"10.1142","volume":"18","author":[{"given":"Abhinav","family":"Bhatel\u00e9","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA"}]},{"given":"Laxmikant V.","family":"Kal\u00e9","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"rf4","unstructured":"Laxmikant V.\u00a0Kale, Petascale Computing: Algorithms and Applications, ed. D.\u00a0Bader (Chapman & Hall, CRC Press, 2008)\u00a0pp. 421\u2013441."},{"key":"rf5","doi-asserted-by":"crossref","unstructured":"L. V.\u00a0Kale and Sanjeev\u00a0Krishnan, Parallel Programming using C++, eds. Gregory V.\u00a0Wilson and Paul\u00a0Lu (MIT Press, 1996)\u00a0pp. 175\u2013213.","DOI":"10.7551\/mitpress\/5241.003.0009"},{"key":"rf6","unstructured":"Klaus\u00a0Schulten, Petascale Computing: Algorithms and Applications, ed. D.\u00a0Bader (Chapman & Hall, CRC Press, 2008)\u00a0pp. 165\u2013181."},{"key":"rf8","first-page":"207","volume":"30","author":"Bokhari Shahid H.","journal-title":"IEEE Trans. Computers"},{"key":"rf10","first-page":"433","volume":"36","author":"Lee Soo-Young","journal-title":"IEEE Trans. Computers"},{"key":"rf11","first-page":"1408","volume":"36","author":"Sadayappan P.","journal-title":"IEEE Trans. Computers"},{"key":"rf13","doi-asserted-by":"publisher","DOI":"10.1109\/12.76410"},{"key":"rf14","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(87)90018-9"},{"key":"rf16","doi-asserted-by":"publisher","DOI":"10.1142\/S0129053392000134"},{"key":"rf22","doi-asserted-by":"publisher","DOI":"10.1147\/rd.492.0489"},{"key":"rf25","doi-asserted-by":"publisher","DOI":"10.1147\/rd.521.0159"},{"key":"rf26","doi-asserted-by":"publisher","DOI":"10.1147\/rd.521.0177"},{"key":"rf31","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1016\/j.future.2004.11.020","volume":"22","author":"Kale Laxmikant V.","journal-title":"Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis"},{"key":"rf32","volume-title":"UPC and Grids in Action","author":"Catlett C.","year":"2007"}],"container-title":["Parallel Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0129626408003569","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,4]],"date-time":"2024-03-04T16:16:50Z","timestamp":1709569010000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0129626408003569"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,12]]},"references-count":14,"journal-issue":{"issue":"04","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[2008,12]]}},"alternative-id":["10.1142\/S0129626408003569"],"URL":"https:\/\/doi.org\/10.1142\/s0129626408003569","relation":{},"ISSN":["0129-6264","1793-642X"],"issn-type":[{"value":"0129-6264","type":"print"},{"value":"1793-642X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,12]]}}}