{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,5]],"date-time":"2022-04-05T08:02:07Z","timestamp":1649145727986},"reference-count":5,"publisher":"World Scientific Pub Co Pte Lt","issue":"04","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Parallel Process. Lett."],"published-print":{"date-parts":[[2009,12]]},"abstract":"<jats:p> In this paper, we present a methodology for profiling parallel applications executing on the family of architectures commonly referred as the \"Cell\" processor. Specifically, we examine Cell-centric MPI programs on hybrid clusters containing multiple Opteron and IBM PowerXCell 8i processors per node such as those used in the petascale Roadrunner system. We analyze the performance of our approach on a PlayStation3 console based on Cell Broadband Engine\u2014the CBE\u2014as well as an IBM BladeCenter QS22 based on PowerXCell 8i. Our implementation incurs less than 0.5% overhead and 0.3 \u00b5s per profiler call for a typical molecular dynamics code on the Cell BE while efficiently utilizing the limited local store of the Cell's SPE cores. Our worst-case overhead analysis on the PowerXCell 8i costs 3.2 \u00b5s per profiler call while using only two 5 KiB buffers. We demonstrate the use of our profiler on a cluster of hybrid nodes running a suite of scientific applications. Our analyses of inter-SPE communication (across the entire cluster) and function call patterns provide valuable information that can be used to optimize application performance. <\/jats:p>","DOI":"10.1142\/s0129626409000407","type":"journal-article","created":{"date-parts":[[2009,12,14]],"date-time":"2009-12-14T02:52:15Z","timestamp":1260759135000},"page":"535-552","source":"Crossref","is-referenced-by-count":0,"title":["AN MPI PERFORMANCE MONITORING INTERFACE FOR CELL BASED COMPUTE NODES"],"prefix":"10.1142","volume":"19","author":[{"given":"HIKMET","family":"DURSUN","sequence":"first","affiliation":[{"name":"Performance and Architecture Laboratory (PAL), Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA"},{"name":"Collaboratory for Advanced Computing and Simulations, Department of Computer Science, University of Southern California, Los Angeles, California 90089-0242, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"KEVIN J.","family":"BARKER","sequence":"additional","affiliation":[{"name":"Performance and Architecture Laboratory (PAL), Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"DARREN J.","family":"KERBYSON","sequence":"additional","affiliation":[{"name":"Performance and Architecture Laboratory (PAL), Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"SCOTT","family":"PAKIN","sequence":"additional","affiliation":[{"name":"Performance and Architecture Laboratory (PAL), Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"RICHARD","family":"SEYMOUR","sequence":"additional","affiliation":[{"name":"Collaboratory for Advanced Computing and Simulations, Department of Computer Science, University of Southern California, Los Angeles, California 90089-0242, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"RAJIV K.","family":"KALIA","sequence":"additional","affiliation":[{"name":"Collaboratory for Advanced Computing and Simulations, Department of Computer Science, University of Southern California, Los Angeles, California 90089-0242, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"AIICHIRO","family":"NAKANO","sequence":"additional","affiliation":[{"name":"Collaboratory for Advanced Computing and Simulations, Department of Computer Science, University of Southern California, Los Angeles, California 90089-0242, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"PRIYA","family":"VASHISHTA","sequence":"additional","affiliation":[{"name":"Collaboratory for Advanced Computing and Simulations, Department of Computer Science, University of Southern California, Los Angeles, California 90089-0242, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"rf6","doi-asserted-by":"publisher","DOI":"10.1147\/rd.515.0559"},{"key":"rf7","first-page":"1","volume":"17","author":"Michael G.","journal-title":"Scientific Programming"},{"key":"rf9","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.41"},{"key":"rf10","author":"Nomura K.","journal-title":"International Journal of Computer Science"},{"key":"rf14","doi-asserted-by":"publisher","DOI":"10.1177\/1094342007085015"}],"container-title":["Parallel Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0129626409000407","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,6]],"date-time":"2019-08-06T16:15:23Z","timestamp":1565108123000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0129626409000407"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,12]]},"references-count":5,"journal-issue":{"issue":"04","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[2009,12]]}},"alternative-id":["10.1142\/S0129626409000407"],"URL":"https:\/\/doi.org\/10.1142\/s0129626409000407","relation":{},"ISSN":["0129-6264","1793-642X"],"issn-type":[{"value":"0129-6264","type":"print"},{"value":"1793-642X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,12]]}}}