{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:13:05Z","timestamp":1759133585923,"version":"3.41.0"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2016,7,20]],"date-time":"2016-07-20T00:00:00Z","timestamp":1468972800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"G8 Research Councils Initiative on Multilateral Research"},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["GSC 111"],"award-info":[{"award-number":["GSC 111"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]},{"name":"U.S. Department of Energy by Lawrence Livermore National Laboratory","award":["DE-AC52-07NA27344 (LLNL-JRNL-663039)"],"award-info":[{"award-number":["DE-AC52-07NA27344 (LLNL-JRNL-663039)"]}]},{"name":"Interdisciplinary Program on Application Software towards Exascale Computing for Global Scale Issues is gratefully acknowledged"},{"name":"Helmholtz Association of German Research Centers","award":["VH-NG-118"],"award-info":[{"award-number":["VH-NG-118"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2016,8,8]]},"abstract":"<jats:p>Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira, Jr., et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. By replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances, even for runs with hundreds of thousands of processes.<\/jats:p>","DOI":"10.1145\/2934661","type":"journal-article","created":{"date-parts":[[2016,7,21]],"date-time":"2016-07-21T15:13:24Z","timestamp":1469114004000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Identifying the Root Causes of Wait States in Large-Scale Parallel Applications"],"prefix":"10.1145","volume":"3","author":[{"given":"David","family":"B\u00f6hme","sequence":"first","affiliation":[{"name":"Lawrence Livermore National Laboratory, USA"}]},{"given":"Markus","family":"Geimer","sequence":"additional","affiliation":[{"name":"J\u00fclich Supercomputing Centre, Germany"}]},{"given":"Lukas","family":"Arnold","sequence":"additional","affiliation":[{"name":"J\u00fclich Supercomputing Centre, Germany"}]},{"given":"Felix","family":"Voigtlaender","sequence":"additional","affiliation":[{"name":"RWTH Aachen University, Germany"}]},{"given":"Felix","family":"Wolf","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Darmstadt, Germany"}]}],"member":"320","published-online":{"date-parts":[[2016,7,20]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Accelerated Strategic Computing Initiative. 1995. The ASCI SWEEP3D Benchmark Code. (1995). http:\/\/www.ccs3.lanl.gov\/pal\/software\/sweep3d\/sweep3d_readme.html.  Accelerated Strategic Computing Initiative. 1995. The ASCI SWEEP3D Benchmark Code. (1995). http:\/\/www.ccs3.lanl.gov\/pal\/software\/sweep3d\/sweep3d_readme.html."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.v22:6"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.12.012"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2012.120"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2010.18"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2003.08.002"},{"volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing (SC\u201908)","author":"Gamblin Todd","key":"e_1_2_1_7_1","unstructured":"Todd Gamblin , Bronis R. de Supinski , Martin Schulz , Rob Fowler , and Daniel A. Reed . 2008. Scalable load-balance measurement for SPMD codes . In Proceedings of the ACM\/IEEE Conference on Supercomputing (SC\u201908) . Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Rob Fowler, and Daniel A. Reed. 2008. Scalable load-balance measurement for SPMD codes. In Proceedings of the ACM\/IEEE Conference on Supercomputing (SC\u201908)."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2009.02.003"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1088\/1367-2630\/9\/7\/218"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1088\/1367-2630\/8\/9\/186"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1086\/504594"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488551.2488569"},{"key":"e_1_2_1_13_1","series-title":"Lecture Notes in Control and Information Sciences","volume-title":"Proceedings of the Workshop on Wide Area Networks and High Performance Computing","author":"Hoisie Adolfy","unstructured":"Adolfy Hoisie , Olaf Lubeck , and Harvey Wasserman . 1999. Performance analysis of wavefront algorithms on very-large scale distributed systems . In Proceedings of the Workshop on Wide Area Networks and High Performance Computing , Lecture Notes in Control and Information Sciences , Vol. 249 . Springer Berlin\/Heidelberg , 171--187. DOI:http:\/\/dx.doi.org\/10.1007\/BFb0110074 10.1007\/BFb0110074 Adolfy Hoisie, Olaf Lubeck, and Harvey Wasserman. 1999. Performance analysis of wavefront algorithms on very-large scale distributed systems. In Proceedings of the Workshop on Wide Area Networks and High Performance Computing, Lecture Notes in Control and Information Sciences, Vol. 249. Springer Berlin\/Heidelberg, 171--187. DOI:http:\/\/dx.doi.org\/10.1007\/BFb0110074"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/238020.238024"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1647539.1647584"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the Conference on Parallel Computing (ParCo","volume":"33","author":"Malony Allen D.","year":"2005","unstructured":"Allen D. Malony , Sameer S. Shende , and Alan Morris . 2005 . Phase-based parallel performance profiling . In Proceedings of the Conference on Parallel Computing (ParCo , Malaga, Spain) (NIC Series) , Vol. 33 . John von Neumann Institute for Computing, 203--210. Allen D. Malony, Sameer S. Shende, and Alan Morris. 2005. Phase-based parallel performance profiling. In Proceedings of the Conference on Parallel Computing (ParCo, Malaga, Spain) (NIC Series), Vol. 33. John von Neumann Institute for Computing, 203--210."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/281035.281046"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/238020.238023"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-85451-7_8"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTR.2005.347035"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87475-1_28"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/301104.301117"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654097"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69814-2_8"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.47"},{"key":"e_1_2_1_26_1","volume-title":"The Community Earth System Model. (Feb","author":"University Corporation for Atmospheric Research (UCAR). 2012.","year":"2012","unstructured":"University Corporation for Atmospheric Research (UCAR). 2012. The Community Earth System Model. (Feb . 2012 ). http:\/\/www.cesm.ucar.edu\/. University Corporation for Atmospheric Research (UCAR). 2012. The Community Earth System Model. (Feb. 2012). http:\/\/www.cesm.ucar.edu\/."},{"key":"e_1_2_1_27_1","volume-title":"Report of the Workshop on Software Development Tools for Petascale Computing. (August","author":"Ed Jeffrey Vetter","year":"2007","unstructured":"Jeffrey Vetter ( Ed .). 2007 . Report of the Workshop on Software Development Tools for Petascale Computing. (August 2007). U.S. Department of Energy, http:\/\/www.csm.ornl.gov\/workshops\/Petascale07\/sdtpc_workshop_report.pdf. Jeffrey Vetter (Ed.). 2007. Report of the Workshop on Software Development Tools for Petascale Computing. (August 2007). U.S. Department of Energy, http:\/\/www.csm.ornl.gov\/workshops\/Petascale07\/sdtpc_workshop_report.pdf."},{"key":"e_1_2_1_28_1","volume-title":"SC\u201912 Workshop on Extreme-Scale Performance Tools.","author":"Wylie Brian J. N.","year":"2012","unstructured":"Brian J. N. Wylie . 2012 . Parallel performance measurement and analysis scaling lessons . SC\u201912 Workshop on Extreme-Scale Performance Tools. Brian J. N. Wylie. 2012. Parallel performance measurement and analysis scaling lessons. SC\u201912 Workshop on Extreme-Scale Performance Tools."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2010.5470816"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2934661","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2934661","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:39:47Z","timestamp":1750217987000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2934661"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,7,20]]},"references-count":29,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,8,8]]}},"alternative-id":["10.1145\/2934661"],"URL":"https:\/\/doi.org\/10.1145\/2934661","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2016,7,20]]},"assertion":[{"value":"2013-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-07-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}