{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T22:57:15Z","timestamp":1777676235950,"version":"3.51.4"},"reference-count":18,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2004,8,1]],"date-time":"2004-08-01T00:00:00Z","timestamp":1091318400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2004,8]]},"abstract":"<jats:p>In this paper we examine the topic of writing fault-tolerant Message Passing Interface (MPI) applications. We discuss the meaning of fault tolerance in general and what the MPI Standard has to say about it. We survey several approaches to this problem, namely checkpointing, restructuring a class of standard MPI programs, modifying MPI semantics, and extending the MPI specification. We conclude that, within certain constraints, MPI can provide a useful context for writing application programs that exhibit significant degrees of fault tolerance.<\/jats:p>","DOI":"10.1177\/1094342004046045","type":"journal-article","created":{"date-parts":[[2004,9,6]],"date-time":"2004-09-06T19:07:52Z","timestamp":1094497672000},"page":"363-372","source":"Crossref","is-referenced-by-count":87,"title":["Fault Tolerance in Message Passing Interface Programs"],"prefix":"10.1177","volume":"18","author":[{"given":"William","family":"Gropp","sequence":"first","affiliation":[{"name":"Mathematics and Computer Science Division Argonne National Laboratory\r                        Argonne, IL 60439, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ewing","family":"Lusk","sequence":"additional","affiliation":[{"name":"Mathematics and Computer Science Division Argonne National Laboratory\r                        Argonne, IL 60439, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2004,8,1]]},"reference":[{"key":"atypb1","volume-title":"Proceedings of the 1st IEEE International Symposium of Cluster Computing and the Grid","author":"Batchu, R."},{"key":"atypb2","volume-title":"Proceedings of SC 2002, IEEE","author":"Bosilca, G."},{"key":"atypb3","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(01)00097-7"},{"key":"atypb4","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45255-9_47"},{"key":"atypb5","author":"Fagg, G.E.","year":"2004","journal-title":"International Journal of High Performance Computer Applications and Supercomputing"},{"key":"atypb6","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(01)00100-4"},{"key":"atypb7","author":"Geist, A.","year":"2004","journal-title":"Journal of Parallel and Distributed Computing"},{"key":"atypb8","volume-title":"Transaction Processing","author":"Gray, J.","year":"1993"},{"key":"atypb9","first-page":"530","volume-title":"Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing","author":"Gropp, W."},{"key":"atypb10","volume-title":"Using MPI: Portable Parallel Programming with the Message Passing Interface","author":"Gropp, W.","year":"1999","edition":"2"},{"issue":"3","key":"atypb11","first-page":"150","volume":"20","author":"Li, K.","year":"1992","journal-title":"International Journal of Parallel Processing"},{"key":"atypb12","doi-asserted-by":"publisher","DOI":"10.1109\/71.298215"},{"key":"atypb13","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626400000342"},{"issue":"3","key":"atypb14","first-page":"165","volume":"8","author":"Message Passing Interface Forum","year":"1994","journal-title":"International Journal of Supercomputer Applications"},{"key":"atypb15","volume-title":"The MPI message-passing interface standard","author":"Message Passing Interface Forum","year":"1995"},{"key":"atypb16","first-page":"48","volume-title":"Symposium on Fault-Tolerant Computing","author":"Rao, S."},{"key":"atypb17","volume-title":"MPI\u2014The Complete Reference: Volume 1, The MPI Core","author":"Snir, M.","year":"1998","edition":"2"},{"key":"atypb18","first-page":"526","volume-title":"Proceedings of IPPS \u201996. The 10th International Parallel Processing Symposium","author":"Stellner, G."}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342004046045","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342004046045","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T08:18:05Z","timestamp":1777450685000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342004046045"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,8]]},"references-count":18,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2004,8]]}},"alternative-id":["10.1177\/1094342004046045"],"URL":"https:\/\/doi.org\/10.1177\/1094342004046045","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2004,8]]}}}