{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T05:35:39Z","timestamp":1740807339045,"version":"3.38.0"},"reference-count":52,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2017,8,23]],"date-time":"2017-08-23T00:00:00Z","timestamp":1503446400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2019,1]]},"abstract":"<jats:p> On future extreme scale computers, it is expected that faults will become an increasingly serious problem as the number of individual components grows and failures become more frequent. This is driving the interest in designing algorithms with built-in fault tolerance that can continue to operate and that can replace data even if part of the computation is lost in a failure. For fault-free computations, the use of adaptive refinement techniques in combination with finite element methods is well established. Furthermore, iterative solution techniques that incorporate information about the grid structure, such as the parallel geometric multigrid method, have been shown to be an efficient approach to solving various types of partial different equations. In this article, we present an advanced parallel adaptive multigrid method that uses dynamic data structures to store a nested sequence of meshes and the iteratively evolving solution. After a fail-stop fault, the data residing on the faulty processor will be lost. However, with suitably designed data structures, the neighbouring processors contain enough information so that a consistent mesh can be reconstructed in the faulty domain with the goal of resuming the computation without having to restart from scratch. This recovery is based on a set of carefully designed distributed algorithms that build on the existing parallel adaptive refinement routines, but which must be carefully augmented and extended. <\/jats:p>","DOI":"10.1177\/1094342017720801","type":"journal-article","created":{"date-parts":[[2017,8,23]],"date-time":"2017-08-23T10:00:36Z","timestamp":1503482436000},"page":"189-211","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":4,"title":["Algorithm-based fault recovery of adaptively refined parallel multilevel grids"],"prefix":"10.1177","volume":"33","author":[{"given":"Linda","family":"Stals","sequence":"first","affiliation":[{"name":"Australian National University, MSI, Canberra, Australia"}]}],"member":"179","published-online":{"date-parts":[[2017,8,23]]},"reference":[{"volume-title":"Towards resilient parallel linear Krylov solvers: recover-restart strategies. Research Report","year":"2013","author":"Agullo E","key":"bibr1-1094342017720801"},{"key":"bibr2-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1109\/12.9736"},{"key":"bibr3-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33518-1_24"},{"key":"bibr4-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1177\/1094342013488238"},{"key":"bibr5-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1137\/0613023"},{"key":"bibr6-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611970753"},{"key":"bibr7-1094342017720801","unstructured":"Bridges PG, Ferreira KB, Heroux MA, (2012) Fault-tolerant linear solvers via selective reliability. arXiv preprint arXiv: 1206.1390."},{"key":"bibr8-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009106189"},{"key":"bibr9-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009347767"},{"key":"bibr10-1094342017720801","first-page":"1","volume":"1","author":"Cappello F","year":"2014","journal-title":"Supercomputing Frontiers and Innovations"},{"key":"bibr11-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304590"},{"key":"bibr12-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442533"},{"key":"bibr13-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2008.58"},{"key":"bibr14-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718133.ch10"},{"key":"bibr15-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1006\/jcph.2000.6593"},{"key":"bibr16-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1007\/s00791-016-0270-6"},{"key":"bibr17-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2004.11.016"},{"key":"bibr18-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1145\/2493123.2462920"},{"key":"bibr19-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1098\/rsta.2013.0276"},{"key":"bibr20-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45255-9_47"},{"key":"bibr21-1094342017720801","doi-asserted-by":"publisher","DOI":"10.4208\/nmtma.2015.w10si"},{"key":"bibr22-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2015.07.003"},{"key":"bibr23-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-02427-0"},{"key":"bibr24-1094342017720801","volume-title":"Fault-Tolerance Techniques for High-Performance Computing. Computer Communications and Networks","author":"Herault T","year":"2015","edition":"1"},{"key":"bibr25-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1984.1676475"},{"key":"bibr26-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1137\/15M1026122"},{"key":"bibr27-1094342017720801","unstructured":"Hursey J (2010) Coordinated checkpoint\/restart process fault tolerance for MPI applications on HPC systems. PhD Thesis, Indiana University."},{"key":"bibr28-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1063\/1.373784"},{"key":"bibr29-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1007\/s10915-015-0068-6"},{"key":"bibr30-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1016\/0743-7315(88)90027-5"},{"key":"bibr31-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470411"},{"key":"bibr32-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1016\/0899-8248(91)90015-M"},{"key":"bibr33-1094342017720801","unstructured":"Mitchell WF (1988) Unified multilevel adaptive finite element methods for elliptic problems. PhD Thesis, Department Of Computer Science, University Of Illinois at Urbana-Champaign, Urbana, IL. Technical Report UIUCDCS-R-88-1436."},{"key":"bibr34-1094342017720801","first-page":"224","volume":"6","author":"Mitchell WF","year":"1998","journal-title":"Electronic Transactions on Numerical Analysis"},{"key":"bibr35-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.18"},{"key":"bibr36-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1063\/1.369157"},{"key":"bibr37-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.1993.22"},{"key":"bibr38-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1137\/0730011"},{"key":"bibr39-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611970968"},{"key":"bibr40-1094342017720801","unstructured":"Sewell EG (1972) Automatic generation of triangulations for piecewise polynomial approximation. PhD Thesis, Purdue University, West Lafayette, IN."},{"key":"bibr41-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2013.145"},{"key":"bibr42-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2012.6263938"},{"key":"bibr43-1094342017720801","unstructured":"Stals L (1995) Parallel multigrid on unstructured grids using adaptive finite element methods. PhD Thesis, Department Of Mathematics, Australian National University, Australia."},{"key":"bibr44-1094342017720801","first-page":"488","volume-title":"9th international conference on domain decomposition","author":"Stals L","year":"1998"},{"volume-title":"Tenth SIAM conference on parallel processing for scientific computing","year":"2001","author":"Stals L","key":"bibr45-1094342017720801"},{"volume-title":"The solution of radiation transport equations with adaptive finite elements. Technical Report","year":"2001","author":"Stals L","key":"bibr46-1094342017720801"},{"key":"bibr47-1094342017720801","first-page":"78","volume-title":"Proceedings of the tenth copper mountain conference on multigrid methods, ETNA","volume":"15","author":"Stals L","year":"2003"},{"key":"bibr48-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1007\/s10915-014-9898-x"},{"key":"bibr49-1094342017720801","doi-asserted-by":"publisher","DOI":"10.21914\/anziamj.v46i0.975"},{"key":"bibr50-1094342017720801","doi-asserted-by":"publisher","DOI":"10.1007\/s00791-006-0033-x"},{"key":"bibr51-1094342017720801","first-page":"663","volume-title":"Computational Techniques and Applications: CTAC97","author":"Stals L","year":"1998"},{"key":"bibr52-1094342017720801","doi-asserted-by":"publisher","DOI":"10.2172\/1089785"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342017720801","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342017720801","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342017720801","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T14:31:47Z","timestamp":1740753107000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342017720801"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,8,23]]},"references-count":52,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,1]]}},"alternative-id":["10.1177\/1094342017720801"],"URL":"https:\/\/doi.org\/10.1177\/1094342017720801","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2017,8,23]]}}}