{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,24]],"date-time":"2026-01-24T18:52:57Z","timestamp":1769280777906,"version":"3.49.0"},"reference-count":277,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2021,12,10]],"date-time":"2021-12-10T00:00:00Z","timestamp":1639094400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:p> This work is based on the seminar titled \u2018Resiliency in Numerical Algorithm Design for Extreme Scale Simulations\u2019 held March 1\u20136, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48\u00a0h on a system consuming 20\u00a0MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 10<jats:sup>23<\/jats:sup> floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically. <\/jats:p>","DOI":"10.1177\/10943420211055188","type":"journal-article","created":{"date-parts":[[2021,12,10]],"date-time":"2021-12-10T09:13:05Z","timestamp":1639127585000},"page":"251-285","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":5,"title":["Resiliency in numerical algorithm design for extreme scale simulations"],"prefix":"10.1177","volume":"36","author":[{"given":"Emmanuel","family":"Agullo","sequence":"first","affiliation":[{"name":"Inria, France"}]},{"given":"Mirco","family":"Altenbernd","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Stuttgart, Germany"}]},{"given":"Hartwig","family":"Anzt","sequence":"additional","affiliation":[{"name":"KIT \u2013 Karlsruher Institut f\u00fcr Technologie, Germany"}]},{"given":"Leonardo","family":"Bautista-Gomez","sequence":"additional","affiliation":[{"name":"Barcelona Supercomputing Center, Spain"}]},{"given":"Tommaso","family":"Benacchio","sequence":"additional","affiliation":[{"name":"Politecnico di Milano, Italy"}]},{"given":"Luca","family":"Bonaventura","sequence":"additional","affiliation":[{"name":"Politecnico di Milano, Italy"}]},{"given":"Hans-Joachim","family":"Bungartz","sequence":"additional","affiliation":[{"name":"TU M\u00fcnchen, Germany"}]},{"given":"Sanjay","family":"Chatterjee","sequence":"additional","affiliation":[{"name":"NVIDIA Corporation, USA"}]},{"given":"Florina M","family":"Ciorba","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Basel, Switzerland"}]},{"given":"Nathan","family":"DeBardeleben","sequence":"additional","affiliation":[{"name":"Los Alamos National Laboratory, USA"}]},{"given":"Daniel","family":"Drzisga","sequence":"additional","affiliation":[{"name":"TU M\u00fcnchen, Germany"}]},{"given":"Sebastian","family":"Eibl","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Erlangen, N\u00fcrnberg, Germany"}]},{"given":"Christian","family":"Engelmann","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, USA"}]},{"given":"Wilfried N","family":"Gansterer","sequence":"additional","affiliation":[{"name":"University of Vienna, Austria"}]},{"given":"Luc","family":"Giraud","sequence":"additional","affiliation":[{"name":"Inria, France"}]},{"given":"Dominik","family":"G\u00f6ddeke","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Stuttgart, Germany"}]},{"given":"Marco","family":"Heisig","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Erlangen, N\u00fcrnberg, Germany"}]},{"given":"Fabienne","family":"J\u00e9z\u00e9quel","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Paris 2, Paris, France"}]},{"given":"Nils","family":"Kohl","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Erlangen, N\u00fcrnberg, Germany"}]},{"given":"Xiaoye Sherry","family":"Li","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, USA"}]},{"given":"Romain","family":"Lion","sequence":"additional","affiliation":[{"name":"University of Bordeaux, France"}]},{"given":"Miriam","family":"Mehl","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Stuttgart, Germany"}]},{"given":"Paul","family":"Mycek","sequence":"additional","affiliation":[{"name":"Cerfacs, France"}]},{"given":"Michael","family":"Obersteiner","sequence":"additional","affiliation":[{"name":"TU M\u00fcnchen, Germany"}]},{"given":"Enrique S","family":"Quintana-Ort\u00ed","sequence":"additional","affiliation":[{"name":"Universitat Polit\u00e8cnica de Val\u00e8ncia, Spain"}]},{"given":"Francesco","family":"Rizzi","sequence":"additional","affiliation":[{"name":"NexGen Analytics, USA"}]},{"given":"Ulrich","family":"R\u00fcde","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Erlangen, N\u00fcrnberg, Germany"},{"name":"Cerfacs, France"}]},{"given":"Martin","family":"Schulz","sequence":"additional","affiliation":[{"name":"TU M\u00fcnchen, Germany"}]},{"given":"Fred","family":"Fung","sequence":"additional","affiliation":[{"name":"Australian National University, Australia"}]},{"given":"Robert","family":"Speck","sequence":"additional","affiliation":[{"name":"Forschungszentrum J\u00fclich GmbH, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1557-8393","authenticated-orcid":false,"given":"Linda","family":"Stals","sequence":"additional","affiliation":[{"name":"Australian National University, Australia"}]},{"given":"Keita","family":"Teranishi","sequence":"additional","affiliation":[{"name":"Sandia National Laboratories, California, USA"}]},{"given":"Samuel","family":"Thibault","sequence":"additional","affiliation":[{"name":"University of Bordeaux, France"}]},{"given":"Dominik","family":"Th\u00f6nnes","sequence":"additional","affiliation":[{"name":"Universit\u00e4t Erlangen, N\u00fcrnberg, Germany"}]},{"given":"Andreas","family":"Wagner","sequence":"additional","affiliation":[{"name":"TU M\u00fcnchen, Germany"}]},{"given":"Barbara","family":"Wohlmuth","sequence":"additional","affiliation":[{"name":"TU M\u00fcnchen, Germany"}]}],"member":"179","published-online":{"date-parts":[[2021,12,10]]},"reference":[{"key":"bibr1-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2015.9"},{"key":"bibr2-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/15M1042115"},{"key":"bibr3-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1002\/nla.2059"},{"key":"bibr4-10943420211055188","volume-title":"On Soft Errors in the Conjugate Gradient method: Sensitivity and Robust Numerical Detection - Revised","author":"Agullo E","year":"2020"},{"key":"bibr5-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/16M1097274"},{"key":"bibr6-10943420211055188","volume-title":"A Posteriori Error Estimation in Finite Element Analysis","volume":"37","author":"Ainsworth M","year":"2011"},{"key":"bibr7-10943420211055188","volume-title":"Introduction to Interval Analysis","author":"Alefeld G","year":"1983"},{"key":"bibr8-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/HPCSim.2015.7237082"},{"key":"bibr9-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342015628056"},{"key":"bibr10-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342016684006"},{"key":"bibr11-10943420211055188","volume-title":"Proceedings of HPCSE\u201919","author":"Altenbernd M","year":"2021"},{"key":"bibr12-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2832080.2832081"},{"key":"bibr13-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2016.11.013"},{"key":"bibr14-10943420211055188","first-page":"813","volume-title":"European conference on parallel processing","author":"Ashraf RA","year":"2018"},{"key":"bibr15-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3184407.3184421"},{"key":"bibr16-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2751504.2751507"},{"key":"bibr17-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.1999.809458"},{"key":"bibr18-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2004.2"},{"key":"bibr19-10943420211055188","volume-title":"Adaptive Finite Element Methods for Differential Equations","author":"Bangerth W","year":"2013"},{"key":"bibr20-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2049673.2049678"},{"key":"bibr21-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/0730048"},{"key":"bibr22-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/322063.322067"},{"key":"bibr23-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.apnum.2017.07.006"},{"key":"bibr24-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/17M1148384"},{"key":"bibr25-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2018.12.006"},{"key":"bibr26-10943420211055188","author":"Bauer M","year":"2020","journal-title":"Computers & Mathematics with Applications"},{"key":"bibr27-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063427"},{"key":"bibr28-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1996.0124"},{"key":"bibr29-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/P3HPC49587.2019.00012"},{"key":"bibr30-10943420211055188","doi-asserted-by":"publisher","DOI":"10.2172\/1607968"},{"key":"bibr31-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342021990433"},{"key":"bibr32-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.39"},{"key":"bibr33-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3086157.3086162"},{"key":"bibr34-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342014532297"},{"key":"bibr35-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.advwatres.2011.02.016"},{"key":"bibr36-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-43659-3_31"},{"key":"bibr37-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2735971"},{"key":"bibr38-10943420211055188","volume-title":"Parallel and Distributed Computation, Numerical Methods","author":"Bertsekas D","year":"1989"},{"key":"bibr39-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/BF02591967"},{"key":"bibr40-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33518-1"},{"key":"bibr41-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.camwa.2019.11.023"},{"key":"bibr42-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2008.12.002"},{"key":"bibr43-10943420211055188","volume-title":"Fault-tolerant linear solvers via selective reliability","author":"Bridges PG","year":"2012"},{"key":"bibr44-10943420211055188","first-page":"123","volume":"1337","author":"Buchwald S","year":"2015","journal-title":"CEUR Workshop Proceedings"},{"key":"bibr45-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-003-0016-4"},{"key":"bibr46-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1017\/S0962492904000182"},{"key":"bibr47-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-14325-5_47"},{"key":"bibr48-10943420211055188","volume-title":"Proceedings of the Symposium on High Performance Computing","author":"Calhoun J","year":"2015"},{"key":"bibr49-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/s10915-018-0778-7"},{"key":"bibr50-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009347767"},{"key":"bibr51-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/s10596-018-9764-2"},{"key":"bibr52-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2015.04.003"},{"key":"bibr53-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342018762531"},{"key":"bibr54-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2019.8891034"},{"key":"bibr55-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2019.00013"},{"key":"bibr56-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/0024-3795(69)90028-7"},{"key":"bibr57-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2008.58"},{"key":"bibr58-10943420211055188","volume-title":"Comprehensive Algorithmic Resilience for Numeric Applications","author":"Chen S","year":"2013"},{"key":"bibr59-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2517327.2442533"},{"key":"bibr60-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342016664796"},{"key":"bibr61-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/140968896"},{"key":"bibr62-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-20119-1\\_1"},{"key":"bibr63-10943420211055188","unstructured":"Cluster File Systems, Inc (2007) Lustre: a scalable, high-performance file system. White paper. Available at https:\/\/cse.buffalo.edu\/faculty\/tkosar\/cse710\/papers\/lustre-whitepaper.pdf."},{"key":"bibr64-10943420211055188","first-page":"15:1","volume-title":"Proceedings of the 25th high performance computing symposium","author":"Coleman E","year":"2017"},{"key":"bibr65-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2006.15"},{"key":"bibr66-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/SMC-IT.2011.29"},{"key":"bibr67-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ARITH.2016.31"},{"key":"bibr68-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2517639"},{"key":"bibr69-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.11"},{"key":"bibr70-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.122"},{"key":"bibr71-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3229710.3229717"},{"key":"bibr72-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/361179.361202"},{"key":"bibr73-10943420211055188","unstructured":"DLS4LB Team (2020) Dynamic loop self-scheduling for load balancing. URL https:\/\/github.com\/unibas-dmi-hpc\/DLS4LB."},{"key":"bibr74-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/16M1106304"},{"key":"bibr75-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2370036.2145845"},{"key":"bibr76-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626411000151"},{"key":"bibr77-10943420211055188","first-page":"35","volume":"21","author":"Eberhart P","year":"2015","journal-title":"Reliable Computing"},{"key":"bibr78-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.07.003"},{"key":"bibr79-10943420211055188","doi-asserted-by":"crossref","first-page":"e2279","DOI":"10.1002\/nla.2279","volume":"27","author":"El Haddad M","year":"2020","journal-title":"Numerical Linear Algebra and Applications"},{"key":"bibr80-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2012.56"},{"key":"bibr81-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.123"},{"key":"bibr82-10943420211055188","volume-title":"Resilient Iterative Linear Solvers Running Through Errors","author":"Elliott J","year":"2015"},{"key":"bibr83-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/568522.568525"},{"key":"bibr84-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2013.114"},{"key":"bibr85-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/PDP.2009.31"},{"key":"bibr86-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-28596-8"},{"key":"bibr87-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1002\/nme.1620320604"},{"key":"bibr88-10943420211055188","doi-asserted-by":"crossref","unstructured":"F\u00e9votte F, Lathuili\u00e8re B. (2019) Debugging and optimization of HPC programs in mixed precision with the verrou tool. In: Computational reproducibility at exascale workshop (CRE2018), in conjunction with the international conference on high performance computing, networking, storage and analysis (SC18), Dallas, USA, 2019.","DOI":"10.1109\/Correctness49594.2019.00006"},{"key":"bibr89-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.49"},{"key":"bibr90-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/DS-RT.2016.27"},{"key":"bibr91-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926295"},{"key":"bibr92-10943420211055188","first-page":"1","volume":"66","author":"Fok P","year":"2015","journal-title":"Journal of Scientific Computing"},{"key":"bibr93-10943420211055188","unstructured":"Forum M (2019) The message passing interface standard. URL https:\/\/www.mpi-forum.org\/."},{"key":"bibr94-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2665073"},{"key":"bibr95-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-0427(00)00409-X"},{"key":"bibr96-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2014.78"},{"key":"bibr97-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ICPPW.2016.56"},{"key":"bibr98-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2133173.2133177"},{"key":"bibr99-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2013.01.006"},{"key":"bibr100-10943420211055188","volume-title":"Synchronous and Asynchronous Optimized Schwarz Method for Poisson\u2019s Equation in Rectangular Domains","author":"Garay JC","year":"2017"},{"key":"bibr101-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143885"},{"key":"bibr102-10943420211055188","doi-asserted-by":"publisher","DOI":"10.21914\/anziamj.v48i0.70"},{"key":"bibr103-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/BF01934907"},{"key":"bibr104-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126972"},{"key":"bibr105-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2012.04.018"},{"key":"bibr106-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1023\/A:1019129717644"},{"key":"bibr107-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-003-0015-5"},{"key":"bibr108-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2005.76"},{"key":"bibr109-10943420211055188","first-page":"08172","volume":"1808","author":"Glusa C","year":"2018","journal-title":"CoRR"},{"key":"bibr110-10943420211055188","volume-title":"Scalable asynchronous domain decomposition solvers","author":"Glusa C","year":"2019"},{"key":"bibr111-10943420211055188","doi-asserted-by":"publisher","DOI":"10.4208\/nmtma.2015.w10si"},{"key":"bibr112-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2015.07.003"},{"key":"bibr113-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1090\/mcom\/3459"},{"key":"bibr114-10943420211055188","first-page":"263","volume-title":"Iterative Methods in Lin. Alg","author":"Griebel M","year":"2013"},{"key":"bibr115-10943420211055188","doi-asserted-by":"publisher","DOI":"10.2140\/camcos.2017.12.25"},{"key":"bibr116-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.128"},{"key":"bibr117-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-43659-3_47"},{"key":"bibr118-10943420211055188","volume-title":"Solving Ordinary Differential Equations I: Nonstiff Problems","author":"Hairer E","year":"2008","edition":"3"},{"key":"bibr119-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/140964448"},{"key":"bibr120-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-04537-5_7"},{"key":"bibr121-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/46\/1\/067"},{"key":"bibr122-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2017.7975296"},{"key":"bibr123-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.78"},{"key":"bibr124-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2009.08.004"},{"key":"bibr125-10943420211055188","volume-title":"Euro-Par 2016: Parallel Processing Workshops","author":"Heene M","year":"2016"},{"key":"bibr126-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-68394-2_31"},{"key":"bibr127-10943420211055188","doi-asserted-by":"publisher","DOI":"10.6028\/jres.049.044"},{"key":"bibr128-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-20943-2"},{"key":"bibr129-10943420211055188","volume-title":"Accuracy and Stability of Numerical Algorithms","author":"Higham N","year":"1996","edition":"1"},{"key":"bibr130-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-40528-5_9"},{"key":"bibr131-10943420211055188","unstructured":"Hoemmen M, Heroux MA (2011) Fault-tolerant iterative methods via selective reliability. In: Proceedings of the IEEE\/ACM conference on supercomputing (SC\u201911), Tampa Florida, 11\u201317 November 2006. http:\/\/www.researchgate.net\/publication\/228940928_Fault-Tolerant_Iterative_Methods_via_Selective_Reliability\/file\/79e41508060305e4ad.pdf."},{"key":"bibr132-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1984.1676475"},{"key":"bibr133-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/15M1026122"},{"key":"bibr134-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342018817088"},{"key":"bibr135-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3147704.3147718"},{"key":"bibr136-10943420211055188","doi-asserted-by":"publisher","DOI":"10.2172\/1436045"},{"key":"bibr137-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2014.7040999"},{"key":"bibr138-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-31619-1_5"},{"key":"bibr139-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2005.119"},{"key":"bibr140-10943420211055188","unstructured":"Jakeman JD, Roberts SG (2011) Local and dimension adaptive sparse grid interpolation and quadrature. arXiv preprint arXiv:11100010. 1110.0010v1."},{"key":"bibr141-10943420211055188","doi-asserted-by":"crossref","unstructured":"Jaulmes L, Casas M, Ayguad\u00e9 MM, et al. (2015) Exploiting asynchrony from exact forward recovery for detected and uncorrected errors in iterative solvers. In: SC \u201915: proceedings of the international conference for high performance computing, networking, storage and analysis, Austin, TX, USA, 15\u201320 November 2015.","DOI":"10.1145\/2807591.2807599"},{"key":"bibr142-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503249"},{"key":"bibr143-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ICYCS.2008.329"},{"key":"bibr144-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2016.68"},{"key":"bibr145-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2676870.2676883"},{"key":"bibr146-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/165854.165874"},{"key":"bibr147-10943420211055188","volume-title":"Spectral\/hp Element Methods for Computational Fluid Dynamics","author":"Karniadakis G","year":"2013"},{"key":"bibr148-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2019.00015"},{"key":"bibr149-10943420211055188","volume-title":"OpenCL (Open Computing Language)","author":"Khronos Team","year":"2020"},{"key":"bibr150-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126943"},{"key":"bibr151-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/FMPC.1996.558063"},{"key":"bibr152-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1002\/nme.6237"},{"key":"bibr153-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342018767736"},{"key":"bibr154-10943420211055188","first-page":"309","volume-title":"Parallel computing: current and future issues of high-end computing (proceedings of the international conference parco05), NIC Series, volume 33","author":"Kollias G","year":"2006"},{"key":"bibr155-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7091-0525-2"},{"key":"bibr156-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342015623623"},{"key":"bibr157-10943420211055188","volume-title":"Numerical Methods for Ordinary Differential Systems","author":"Lambert J","year":"1991"},{"key":"bibr158-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/040620394"},{"key":"bibr159-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-47956-5_14"},{"key":"bibr160-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1017\/S0962492911000043"},{"key":"bibr161-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611976137.8"},{"key":"bibr162-10943420211055188","first-page":"91","volume-title":"International workshop on performance modeling, benchmarking and simulation of high performance computer systems","author":"Levy S","year":"2013"},{"key":"bibr163-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER49012.2020.00043"},{"key":"bibr164-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3126908.3126915"},{"key":"bibr165-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2018.8622520"},{"key":"bibr166-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/DSNW.2012.6264672"},{"key":"bibr167-10943420211055188","volume-title":"Krylov Subspace Methods","author":"Liesen J","year":"2013"},{"key":"bibr168-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ICASI.2017.7988587"},{"key":"bibr169-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/FTXS51974.2020.00009"},{"key":"bibr170-10943420211055188","unstructured":"Lion R (2019) Tol\u00e9rance aux pannes dans l\u2019ex\u00e9cution distribu\u00e9e de graphes de t\u00e2ches. In: Conf\u00e9rence d\u2019informatique en parall\u00e9lisme, architecture et syst\u00e8me, Anglet, France, 2019. https:\/\/hal.inria.fr\/hal-02296118."},{"key":"bibr171-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2018.09.041"},{"key":"bibr172-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2007.45"},{"key":"bibr173-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/s00211-017-0872-z"},{"key":"bibr174-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2016.7761580"},{"key":"bibr175-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3200691.3178502"},{"key":"bibr176-10943420211055188","unstructured":"Mercury Team (2020) MERCURY programming language. URL https:\/\/www.mercurylang.org\/"},{"key":"bibr177-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1017\/S096249290626001X"},{"key":"bibr178-10943420211055188","doi-asserted-by":"publisher","DOI":"10.2172\/1762089"},{"key":"bibr179-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2003.1228507"},{"key":"bibr180-10943420211055188","volume-title":"Future Generation Computer Systems (FGCS) 2019","author":"Mohammed A","year":"2019"},{"key":"bibr181-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/HPCS48598.2019.9188153"},{"key":"bibr182-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2017.01.022"},{"key":"bibr183-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.18"},{"key":"bibr184-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ScalA.2016.010"},{"key":"bibr185-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2012.10.038"},{"key":"bibr186-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/15M1051786"},{"key":"bibr187-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/1274971.1274978"},{"key":"bibr188-10943420211055188","volume-title":"Nanos++","author":"Nanos Team","year":"2020"},{"key":"bibr189-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2016.28"},{"key":"bibr190-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00099"},{"key":"bibr191-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972931.12"},{"key":"bibr192-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2909428.2909431"},{"key":"bibr193-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3148226.3148229"},{"key":"bibr194-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/24.994913"},{"key":"bibr195-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-75178-8\\_46"},{"key":"bibr196-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/FTXS.2018.00009"},{"key":"bibr197-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3337821.3337849"},{"key":"bibr198-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/FTXS49593.2019.00009"},{"key":"bibr199-10943420211055188","volume-title":"Monte Carlo Arithmetic: a Framework for the Statistical Analysis of Roundoff Error","author":"Parker DS","year":"2001"},{"key":"bibr200-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-22997-3_3"},{"key":"bibr201-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-29400-7_25"},{"key":"bibr202-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611973440.51"},{"key":"bibr203-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2807591.2807640"},{"key":"bibr204-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1098\/rsta.2009.0155"},{"key":"bibr205-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/FTXS.2018.00006"},{"key":"bibr206-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2019.09.006"},{"key":"bibr207-10943420211055188","volume-title":"Numerical approximation of partial differential equations","volume":"23","author":"Quarteroni A","year":"2008"},{"key":"bibr208-10943420211055188","doi-asserted-by":"crossref","unstructured":"Radojkovic P, Marazakis M, Carpenter P, et al. (2020) Towards resilient EU HPC Systems: a blueprint. research report, European HPC resilience initiative. https:\/\/hal.inria.fr\/hal-02922257.","DOI":"10.1145\/3310273.3323434"},{"key":"bibr209-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2008.4658655"},{"key":"bibr210-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2005.34"},{"key":"bibr211-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2909428.2909429"},{"key":"bibr212-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2017.05.005"},{"key":"bibr213-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1541"},{"key":"bibr214-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503271"},{"key":"bibr215-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/0730011"},{"key":"bibr216-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611970968"},{"key":"bibr217-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1090\/conm\/180\/01962"},{"key":"bibr218-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/18M1205492"},{"key":"bibr219-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2530268.2530272"},{"key":"bibr220-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.46"},{"key":"bibr221-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/s10543-006-0095-7"},{"key":"bibr222-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/SPA.2007.5903294"},{"key":"bibr223-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/15M1035240"},{"key":"bibr224-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/17M1128411"},{"key":"bibr225-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2013.07.041"},{"key":"bibr226-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2866794"},{"key":"bibr227-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2866794"},{"key":"bibr228-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/1995896.1995922"},{"key":"bibr229-10943420211055188","unstructured":"Southwell RV (1946) Relaxation methods in engineering science, a treatise on approximate computation. Oxford, Oxford Univ. Pr., 1. ed., reprint."},{"key":"bibr230-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2016.12.001"},{"key":"bibr231-10943420211055188","first-page":"73","volume-title":"Parallel Algorithms","author":"Spit\u00e9ri P","year":"1986"},{"key":"bibr232-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.advengsoft.2020.102896"},{"key":"bibr233-10943420211055188","volume-title":"Parallel multigrid on unstructured grids using adaptive finite element methods","author":"Stals L","year":"1995"},{"key":"bibr234-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342017720801"},{"key":"bibr235-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/DSNW.2012.6264669"},{"key":"bibr236-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2013.05.182"},{"key":"bibr237-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/PDP.2015.17"},{"key":"bibr238-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2742854.2742903"},{"key":"bibr239-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2016.54"},{"key":"bibr240-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2017.128"},{"key":"bibr241-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2017.40"},{"key":"bibr242-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1177\/1094342016669416"},{"key":"bibr243-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ISPDC.2015.29"},{"key":"bibr244-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1023\/A:1016617309072"},{"key":"bibr245-10943420211055188","volume-title":"The Mystery of Asynchronous Iterations Convergence When the Spectral Radius is One","author":"Szyld DB","year":"1998"},{"key":"bibr246-10943420211055188","first-page":"377","volume-title":"Computational Fluid and Solid Mechanics","author":"Szyld DB","year":"2000"},{"key":"bibr247-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2781257"},{"key":"bibr248-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2907294.2907306"},{"key":"bibr249-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2017.115"},{"key":"bibr250-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/3208040.3208050"},{"key":"bibr251-10943420211055188","unstructured":"Team CD (2014) Containment domains. URL https:\/\/lph.ece.utexas.edu\/public\/CDs\/ContainmentDomains."},{"key":"bibr252-10943420211055188","unstructured":"Team R (2019) Raja performance portability layer. https:\/\/github.com\/LLNL\/RAJA."},{"key":"bibr253-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2642769.2642774"},{"key":"bibr254-10943420211055188","volume-title":"OpenMP (Open Multi-Processing)","author":"The OpenMP Architecture Review Boards","year":"2019"},{"key":"bibr255-10943420211055188","volume-title":"On Runtime Systems for Task-Based Programming on Heterogeneous Platforms","author":"Thibault S","year":"2018"},{"key":"bibr256-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1002\/qj.2544"},{"key":"bibr257-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827599353865"},{"key":"bibr258-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/1183401.1183433"},{"key":"bibr259-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1023\/B:NUMA.0000049483.75679.ce"},{"key":"bibr260-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2002.1003565"},{"key":"bibr261-10943420211055188","doi-asserted-by":"publisher","DOI":"10.5479\/sil.538961.39088011475779"},{"key":"bibr262-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1515\/9781400882618-003"},{"key":"bibr263-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2007.370307"},{"key":"bibr264-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2010.48"},{"key":"bibr265-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2011.10.009"},{"key":"bibr266-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2001.941425"},{"key":"bibr267-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2019.00021"},{"key":"bibr268-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2600212.2600232"},{"key":"bibr269-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1145\/2907294.2907315"},{"key":"bibr270-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2019.05.004"},{"key":"bibr271-10943420211055188","first-page":"172","volume-title":"International Workshop on Languages and Compilers for Parallel Computing","author":"Yan Y","year":"2009"},{"key":"bibr272-10943420211055188","unstructured":"Yu J, Jian D, Wu Z, et al. (2011) Thread-level redundancy fault tolerant CMP based on relaxed input replication. In: 2011 6th international conference on computer sciences and convergence information technology (ICCIT), Seogwipo, Korea (South), 29 November\u20131 December 2011, pp. 544\u2013549."},{"key":"bibr273-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/tpds.2020.3043449"},{"key":"bibr274-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1109\/DSNW.2012.6264677"},{"key":"bibr275-10943420211055188","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-17353-5\\_11"},{"key":"bibr276-10943420211055188","volume-title":"Zoltan: Parallel Partitioning, Load Balancing and Data-Management Services","author":"Zoltan Team","year":"2013"},{"key":"bibr277-10943420211055188","first-page":"37","volume-title":"Integer and nonlinear programming","author":"Zoutendijk G","year":"1970"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420211055188","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420211055188","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420211055188","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T20:19:07Z","timestamp":1740860347000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420211055188"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,10]]},"references-count":277,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["10.1177\/10943420211055188"],"URL":"https:\/\/doi.org\/10.1177\/10943420211055188","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,10]]}}}