{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T08:57:26Z","timestamp":1768208246967,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","license":[{"start":{"date-parts":[[2011,11,12]],"date-time":"2011-11-12T00:00:00Z","timestamp":1321056000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2011,11,12]]},"DOI":"10.1145\/2063384.2063428","type":"proceedings-article","created":{"date-parts":[[2011,11,8]],"date-time":"2011-11-08T13:32:09Z","timestamp":1320759129000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":55,"title":["Checkpointing strategies for parallel jobs"],"prefix":"10.1145","author":[{"given":"Marin","family":"Bougeret","sequence":"first","affiliation":[{"name":"ENS Lyon, France"}]},{"given":"Henri","family":"Casanova","sequence":"additional","affiliation":[{"name":"Univ. of Hawai'i at M\u0101noa, Honolulu"}]},{"given":"Mikael","family":"Rabie","sequence":"additional","affiliation":[{"name":"ENS Lyon, France"}]},{"given":"Yves","family":"Robert","sequence":"additional","affiliation":[{"name":"ENS Lyon, France"}]},{"given":"Fr\u00e9d\u00e9ric","family":"Vivien","sequence":"additional","affiliation":[{"name":"INRIA, Lyon, France"}]}],"member":"320","published-online":{"date-parts":[[2011,11,12]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1465482.1465560"},{"key":"e_1_3_2_1_2_1","unstructured":"L. Bautista Gomez A. Nukada N. Maruyama F. Cappello and S. Matsuoka. Transparent low-overhead checkpoint for GPU-accelerated clusters. https:\/\/wiki.ncsa.illinois.edu\/download\/attachments\/17630761\/INRIA-UIUC-WS4-lbautista.pdf?version=1&modificationDate=1290470402000.  L. Bautista Gomez A. Nukada N. Maruyama F. Cappello and S. Matsuoka. Transparent low-overhead checkpoint for GPU-accelerated clusters. https:\/\/wiki.ncsa.illinois.edu\/download\/attachments\/17630761\/INRIA-UIUC-WS4-lbautista.pdf?version=1&modificationDate=1290470402000."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","DOI":"10.1137\/1.9780898719642","volume-title":"ScaLAPACK Users' Guide","author":"Blackford L. S.","year":"1997","unstructured":"L. S. Blackford , J. Choi , A. Cleary , E. D'Azevedo , J. Demmel , I. Dhillon , J. Dongarra , S. Hammarling , G. Henry , A. Petitet , K. Stanley , D. Walker , and R. C. Whaley . ScaLAPACK Users' Guide . SIAM , 1997 . L. S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R. C. Whaley. ScaLAPACK Users' Guide. SIAM, 1997."},{"key":"e_1_3_2_1_4_1","volume-title":"GUC'2009","author":"Bland A.","year":"2009","unstructured":"A. Bland , R. Kendall , D. Kothe , J. Rogers , and G. Shipman . Jaguar: The World's Most Powerful Computer . In GUC'2009 , 2009 . A. Bland, R. Kendall, D. Kothe, J. Rogers, and G. Shipman. Jaguar: The World's Most Powerful Computer. In GUC'2009, 2009."},{"key":"e_1_3_2_1_6_1","series-title":"LNCS","first-page":"206","volume-title":"PPAM","author":"Bouguerra M.-S.","year":"2010","unstructured":"M.-S. Bouguerra , T. Gautier , D. Trystram , and J.-M. Vincent . A flexible checkpoint\/restart model in distributed systems . In PPAM , volume 6067 of LNCS , pages 206 -- 215 , 2010 . M.-S. Bouguerra, T. Gautier, D. Trystram, and J.-M. Vincent. A flexible checkpoint\/restart model in distributed systems. In PPAM, volume 6067 of LNCS, pages 206--215, 2010."},{"key":"e_1_3_2_1_7_1","volume-title":"INRIA","author":"Bouguerra M. S.","year":"2010","unstructured":"M. S. Bouguerra , D. Trystram , and F. Wagner . An optimal algorithm for scheduling checkpoints with variable costs. Technical report , INRIA , Oct. 2010 . M. S. Bouguerra, D. Trystram, and F. Wagner. An optimal algorithm for scheduling checkpoints with variable costs. Technical report, INRIA, Oct. 2010."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2010.26"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.452.0311"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2004.11.016"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009347714"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/568522.568525"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/511399.511362"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536302"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1851476.1851509"},{"key":"e_1_3_2_1_16_1","first-page":"381","volume-title":"FTCS '95","author":"Kolettis N.","year":"1995","unstructured":"N. Kolettis and N. D. Fulton . Software rejuvenation: Analysis, module and applications . In FTCS '95 , page 381 , Washington, DC, USA , 1995 . IEEE CS. N. Kolettis and N. D. Fulton. Software rejuvenation: Analysis, module and applications. In FTCS '95, page 381, Washington, DC, USA, 1995. IEEE CS."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2010.71"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.2197"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.936236"},{"key":"e_1_3_2_1_20_1","first-page":"1","volume-title":"IPDPS 2008","author":"Liu Y.","year":"2008","unstructured":"Y. Liu , R. Nassar , C. Leangsuksun , N. Naksinehaboon , M. Paun , and S. Scott . An optimal checkpoint\/restart model for a large scale high performance computing system . In IPDPS 2008 , pages 1 -- 9 . IEEE, 2008 . Y. Liu, R. Nassar, C. Leangsuksun, N. Naksinehaboon, M. Paun, and S. Scott. An optimal checkpoint\/restart model for a large scale high performance computing system. In IPDPS 2008, pages 1--9. IEEE, 2008."},{"key":"e_1_3_2_1_21_1","unstructured":"E. Meneses. Clustering Parallel Applications to Enhance Message Logging Protocols. https:\/\/wiki.ncsa.illinois.edu\/download\/attachments\/17630761\/INRIA-UIUC-WS4-emenese.pdf?version=1&modificationDate=1290466786000.  E. Meneses. Clustering Parallel Applications to Enhance Message Logging Protocols. https:\/\/wiki.ncsa.illinois.edu\/download\/attachments\/17630761\/INRIA-UIUC-WS4-emenese.pdf?version=1&modificationDate=1290466786000."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.18"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2006.22"},{"key":"e_1_3_2_1_24_1","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman M. L.","year":"2005","unstructured":"M. L. Puterman . Markov Decision Processes: Discrete Stochastic Dynamic Programming . Wiley , 2005 . M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 2005."},{"key":"e_1_3_2_1_25_1","volume-title":"Exascale software study: Software challenges in extreme scale systems","author":"Sarkar V.","year":"2009","unstructured":"V. Sarkar and others. Exascale software study: Software challenges in extreme scale systems , 2009 . White paper available at: http:\/\/users.ece.gatech.edu\/mrichard\/ExascaleComputingStudyReports\/ECSS%20report%20101909.pdf. V. Sarkar and others. Exascale software study: Software challenges in extreme scale systems, 2009. White paper available at: http:\/\/users.ece.gatech.edu\/mrichard\/ExascaleComputingStudyReports\/ECSS%20report%20101909.pdf."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2006.5"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/190.357398"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1137\/0213039"},{"issue":"08","key":"e_1_3_2_1_29_1","first-page":"2690","article-title":"Analysis of Dependencies of Checkpoint Cost and Checkpoint Interval of Fault Tolerant MPI Applications","volume":"2","author":"Venkatesh K.","year":"2010","unstructured":"K. Venkatesh . Analysis of Dependencies of Checkpoint Cost and Checkpoint Interval of Fault Tolerant MPI Applications . Analysis , 2 ( 08 ): 2690 -- 2697 , 2010 . K. Venkatesh. Analysis of Dependencies of Checkpoint Cost and Checkpoint Interval of Fault Tolerant MPI Applications. Analysis, 2(08):2690--2697, 2010.","journal-title":"Analysis"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2005.67"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/361147.361115"}],"event":{"name":"SC '11: International Conference for High Performance Computing, Networking, Storage and Analysis","location":"Seattle Washington","acronym":"SC '11","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","IEEE-CS Computer Society"]},"container-title":["Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2063384.2063428","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2063384.2063428","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:07Z","timestamp":1750241167000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2063384.2063428"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,11,12]]},"references-count":30,"alternative-id":["10.1145\/2063384.2063428","10.1145\/2063384"],"URL":"https:\/\/doi.org\/10.1145\/2063384.2063428","relation":{},"subject":[],"published":{"date-parts":[[2011,11,12]]},"assertion":[{"value":"2011-11-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}