{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:17:43Z","timestamp":1750306663228,"version":"3.41.0"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2015,4,13]],"date-time":"2015-04-13T00:00:00Z","timestamp":1428883200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF award CCF-0953210","award":["s"],"award-info":[{"award-number":["s"]}]},{"name":"Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357"},{"name":"U.S. National Science Foundation award CNS-0958512"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2015,5,21]]},"abstract":"<jats:p>Next-generation HPC computing platforms are likely to be characterized by significant, unpredictable nonuniformities in execution time among compute nodes and cores. The resulting load imbalances from this nonuniformity are expected to arise from a variety of sources\u2014manufacturing discrepancies, dynamic power management, runtime component failure, OS jitter, software-mediated resiliency, and TLB\/- cache performance variations, for example. It is well understood that existing algorithms with frequent points of bulk synchronization will perform relatively poorly in the presence of these sources of process nonuniformity. Thus, recasting classic bulk synchronous algorithms into more asynchronous, coarse-grained parallelism is a critical area of research for next-generation computing. We propose a class of parallel algorithms for explicit stencil computations that can tolerate these nonuniformities by decoupling per process communication and computation in order for each process to progress asynchronously while maintaining solution correctness. These algorithms are benchmarked with a 1D domain decomposed (\u201cslabbed\u201d) implementation of the 2D heat equation as a model problem, and are tested in the presence of simulated nonuniform process execution rates. The resulting performance is compared to a classic bulk synchronous implementation of the model problem. Results show that the runtime of this article\u2019s algorithm on a machine with simulated process nonuniformities is 5--99% slower than the runtime of its classic counterpart on a machine free of nonuniformities. However, when both algorithms are run on a machine with comparable synthetic process nonuniformities, this article\u2019s algorithm is 1--37 times faster than its classic counterpart.<\/jats:p>","DOI":"10.1145\/2742351","type":"journal-article","created":{"date-parts":[[2015,4,14]],"date-time":"2015-04-14T12:32:19Z","timestamp":1429014739000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution Rates"],"prefix":"10.1145","volume":"2","author":[{"given":"Adam","family":"Hammouda","sequence":"first","affiliation":[{"name":"Argonne National Laboratory, Argonne, IL"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew R.","family":"Siegel","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory, Argonne, IL"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephen F.","family":"Siegel","sequence":"additional","affiliation":[{"name":"University of Delaware, Newark, DE"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,4,13]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_2_1_1_1","DOI":"10.1007\/11602569_31"},{"doi-asserted-by":"publisher","key":"e_1_2_1_2_1","DOI":"10.1145\/582034.582086"},{"doi-asserted-by":"crossref","unstructured":"J. A. Ang R. F. Barrett R. E. Benner D. Burke C. Chan D. Donofrio etal 2014. Abstract Machine Models and Proxy Architectures for Exascale Computing. Technical Report. Sandia National Laboratories and Lawrence Berkeley National Laboratory Berkeley CA.  J. A. Ang R. F. Barrett R. E. Benner D. Burke C. Chan D. Donofrio et al. 2014. Abstract Machine Models and Proxy Architectures for Exascale Computing. Technical Report. Sandia National Laboratories and Lawrence Berkeley National Laboratory Berkeley CA.","key":"e_1_2_1_3_1","DOI":"10.1109\/Co-HPC.2014.4"},{"doi-asserted-by":"publisher","key":"e_1_2_1_4_1","DOI":"10.1016\/S0022-0000(75)80018-3"},{"doi-asserted-by":"publisher","key":"e_1_2_1_5_1","DOI":"10.1109\/TDSC.2004.2"},{"volume-title":"Principles of Model Checking","author":"Baier Christel","key":"e_1_2_1_6_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_7_1","DOI":"10.1145\/2312005.2312044"},{"doi-asserted-by":"publisher","key":"e_1_2_1_8_1","DOI":"10.1145\/322063.322067"},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1109\/CLUSTR.2006.311846"},{"doi-asserted-by":"publisher","key":"e_1_2_1_10_1","DOI":"10.1007\/978-3-642-13374-9_16"},{"doi-asserted-by":"publisher","key":"e_1_2_1_11_1","DOI":"10.1016\/j.parco.2010.12.005"},{"volume-title":"ECSS Report TR-2008-13. The Department of Energy","year":"2008","author":"Bergman Keren","key":"e_1_2_1_12_1"},{"unstructured":"David L. Brown Paul Messina Pete Beckman David Keyes Jeffery Vetter Mihai Anitescu etal 2010. Cross Cutting Technologies For Computing At the Exascale. Technical Report. U.S. Department of Energy (DOE) Office of Advanced Scientific Computing Research and the National Nuclear Security Administration Washington DC.  David L. Brown Paul Messina Pete Beckman David Keyes Jeffery Vetter Mihai Anitescu et al. 2010. Cross Cutting Technologies For Computing At the Exascale. Technical Report. U.S. Department of Energy (DOE) Office of Advanced Scientific Computing Research and the National Nuclear Security Administration Washington DC.","key":"e_1_2_1_13_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_14_1","DOI":"10.1016\/0024-3795(69)90028-7"},{"doi-asserted-by":"publisher","key":"e_1_2_1_15_1","DOI":"10.1145\/973097.973098"},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.1016\/0377-0427(89)90045-9"},{"volume-title":"Proceedings of the 2nd Workshop on Exascale Evaluation and Research Techniques, Held in Conjunction with ASPLOS","year":"2011","author":"Davis John D.","key":"e_1_2_1_18_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_19_1","DOI":"10.1145\/2408776.2408794"},{"volume-title":"Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS\u201908)","year":"2008","author":"Demmel J.","key":"e_1_2_1_20_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_21_1","DOI":"10.1145\/582034.582084"},{"doi-asserted-by":"publisher","key":"e_1_2_1_22_1","DOI":"10.1016\/j.jcp.2014.06.017"},{"key":"e_1_2_1_23_1","first-page":"21","article-title":"Cache optimization for structured and unstructured grid multigrid","volume":"10","author":"Douglas Craig C.","year":"2000","journal-title":"Electronic Transactions on Numerical Analysis"},{"doi-asserted-by":"publisher","key":"e_1_2_1_24_1","DOI":"10.1145\/957717.957772"},{"doi-asserted-by":"publisher","key":"e_1_2_1_25_1","DOI":"10.1109\/CLUSTER.2010.41"},{"doi-asserted-by":"publisher","key":"e_1_2_1_26_1","DOI":"10.5555\/1413370.1413390"},{"doi-asserted-by":"publisher","key":"e_1_2_1_27_1","DOI":"10.1145\/1088149.1088197"},{"doi-asserted-by":"publisher","key":"e_1_2_1_28_1","DOI":"10.1086\/317361"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.1007\/11945918_45"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.5555\/2388996.2389132"},{"doi-asserted-by":"publisher","key":"e_1_2_1_31_1","DOI":"10.1109\/PDCAT.2010.86"},{"volume-title":"Using MPI: Portable Parallel Programming with the Message-Passing Interface","author":"Gropp William","edition":"2","key":"e_1_2_1_32_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_33_1","DOI":"10.1142\/S0129626409000420"},{"doi-asserted-by":"publisher","key":"e_1_2_1_34_1","DOI":"10.1109\/SC.2010.12"},{"doi-asserted-by":"publisher","key":"e_1_2_1_35_1","DOI":"10.1145\/256167.256201"},{"doi-asserted-by":"publisher","key":"e_1_2_1_36_1","DOI":"10.1109\/HPCSim.2013.6641391"},{"doi-asserted-by":"publisher","key":"e_1_2_1_37_1","DOI":"10.1145\/1178597.1178605"},{"doi-asserted-by":"publisher","key":"e_1_2_1_38_1","DOI":"10.1016\/S0020-0190(98)00061-1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_39_1","DOI":"10.1145\/1953611.1953615"},{"volume-title":"Proceedings of the 4th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS13)","author":"Levy S.","key":"e_1_2_1_40_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_41_1","DOI":"10.1145\/361227.361234"},{"key":"e_1_2_1_42_1","first-page":"219","article-title":"Dynamic BSP: Towards a flexible approach to parallel computing over the grid","volume":"2004","author":"Martin Jeremy M. R.","year":"2004","journal-title":"Communicating Process Architectures"},{"volume-title":"Time Skewing: A Value-Based Approach to Optimizing for Memory Locality. Technical Report","year":"1999","author":"Mccalpin John","key":"e_1_2_1_43_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_44_1","DOI":"10.1145\/1542275.1542313"},{"doi-asserted-by":"publisher","key":"e_1_2_1_45_1","DOI":"10.1145\/1362622.1362662"},{"doi-asserted-by":"publisher","key":"e_1_2_1_46_1","DOI":"10.1145\/1048935.1050204"},{"doi-asserted-by":"publisher","key":"e_1_2_1_47_1","DOI":"10.1145\/2414729.2414737"},{"doi-asserted-by":"publisher","key":"e_1_2_1_48_1","DOI":"10.1145\/1932682.1869525"},{"doi-asserted-by":"publisher","key":"e_1_2_1_49_1","DOI":"10.5555\/370049.370403"},{"doi-asserted-by":"publisher","key":"e_1_2_1_51_1","DOI":"10.1109\/IPDPSW.2012.116"},{"volume-title":"Scientific Parallel Computing","author":"Scott L. Ridgway","key":"e_1_2_1_52_1"},{"doi-asserted-by":"crossref","unstructured":"M. Snir R. W. Wisniewski J. A. Abraham S. V. Adve S. Bagchi Pavan Balaji J. Belak P. Bose F. Cappello B. Carlson Andrew A. Chien P. Coteus N. A. Debardeleben P. Diniz C. Engelmann M. Erez S. Fazzari A. Geist R. Gupta F. Johnson Sriram Krishnamoorthy Sven Leyffer D. Liberty S. Mitra T. S. Munson R. Schreiber J. Stearley and E. V. Hensbergen. 2013. Addressing Failures in Exascale Computing. Technical Report Argonne National Laboratory Argonne IL.  M. Snir R. W. Wisniewski J. A. Abraham S. V. Adve S. Bagchi Pavan Balaji J. Belak P. Bose F. Cappello B. Carlson Andrew A. Chien P. Coteus N. A. Debardeleben P. Diniz C. Engelmann M. Erez S. Fazzari A. Geist R. Gupta F. Johnson Sriram Krishnamoorthy Sven Leyffer D. Liberty S. Mitra T. S. Munson R. Schreiber J. Stearley and E. V. Hensbergen. 2013. Addressing Failures in Exascale Computing. Technical Report Argonne National Laboratory Argonne IL.","key":"e_1_2_1_53_1","DOI":"10.2172\/1078029"},{"doi-asserted-by":"publisher","key":"e_1_2_1_54_1","DOI":"10.1007\/s00165-010-0163-2"},{"doi-asserted-by":"publisher","key":"e_1_2_1_55_1","DOI":"10.5555\/645455.653905"},{"doi-asserted-by":"publisher","key":"e_1_2_1_56_1","DOI":"10.1145\/1088149.1088190"},{"doi-asserted-by":"publisher","key":"e_1_2_1_57_1","DOI":"10.1145\/79173.79181"},{"doi-asserted-by":"publisher","key":"e_1_2_1_58_1","DOI":"10.1007\/978-3-540-87744-8_2"},{"doi-asserted-by":"publisher","key":"e_1_2_1_59_1","DOI":"10.1142\/S0129626403001318"},{"doi-asserted-by":"publisher","key":"e_1_2_1_60_1","DOI":"10.5555\/846234.849346"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2742351","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2742351","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:00:33Z","timestamp":1750230033000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2742351"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,4,13]]},"references-count":58,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,5,21]]}},"alternative-id":["10.1145\/2742351"],"URL":"https:\/\/doi.org\/10.1145\/2742351","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2015,4,13]]},"assertion":[{"value":"2013-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-04-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}