{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T03:46:57Z","timestamp":1772164017905,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,1,26]],"date-time":"2017-01-26T00:00:00Z","timestamp":1485388800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Tsinghua University Initiative Scientific Research Program"},{"name":"National Natural Science Foundation of China","award":["61232008"],"award-info":[{"award-number":["61232008"]}]},{"name":"National High-Tech Research and Development Plan of China (863 project)","award":["2015AA01A301"],"award-info":[{"award-number":["2015AA01A301"]}]},{"name":"Microsoft Research Asia Collaborative Research Program","award":["FY16-RES-THEME-095"],"award-info":[{"award-number":["FY16-RES-THEME-095"]}]},{"name":"National Key Research and Development Program of China","award":["2016YFB0200100"],"award-info":[{"award-number":["2016YFB0200100"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,1,26]]},"DOI":"10.1145\/3018743.3018745","type":"proceedings-article","created":{"date-parts":[[2017,1,27]],"date-time":"2017-01-27T13:41:04Z","timestamp":1485524464000},"page":"401-413","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Self-Checkpoint"],"prefix":"10.1145","author":[{"given":"Xiongchao","family":"Tang","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Jidong","family":"Zhai","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Bowen","family":"Yu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Wenguang","family":"Chen","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Weimin","family":"Zheng","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2017,1,26]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"top500 website. http:\/\/top500.org\/.  top500 website. http:\/\/top500.org\/."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1006209.1006248"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063427"},{"key":"e_1_3_2_1_4_1","first-page":"477","volume-title":"A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI","author":"Bland W.","year":"2012","unstructured":"W. Bland , P. Du , A. Bouteiller , T. Herault , G. Bosilca , and J. Dongarra . A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI . In SpringerLink, pages 477 -- 488 . Springer Berlin Heidelberg , Aug. 2012 . URL http:\/\/link.springer.com\/chapter\/10.1007\/978--3--642--32820--6_48. DOI: 10.1007\/978--3--642--32820--6\\_48. 10.1007\/978--3--642--32820--6 W. Bland, P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra. A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI. In SpringerLink, pages 477--488. Springer Berlin Heidelberg, Aug. 2012. URL http:\/\/link.springer.com\/chapter\/10.1007\/978--3--642--32820--6_48. DOI: 10.1007\/978--3--642--32820--6\\_48."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/762761.762815"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1048935.1050176"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1996130.1996142"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442533"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065944.1065973"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1995896.1995923"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654117"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/11846802_26"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-013-0884-0"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2013.6575356"},{"key":"e_1_3_2_1_15_1","series-title":"Lecture Notes in Computer Science","first-page":"346","volume-title":"Supporting Dynamic Applications in a Dynamic World","author":"Fagg G. E.","year":"1908","unstructured":"G. E. Fagg and J. J. Dongarra . FT-MPI: Fault Tolerant MPI , Supporting Dynamic Applications in a Dynamic World . In J. Dongarra, P. Kacsuk, and N. Podhorszki, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, number 1908 in Lecture Notes in Computer Science , pages 346 -- 353 . Springer Berlin Heidelberg , 2000. ISBN 978--3--540--41010--2, 978--3--540--45255--3. G. E. Fagg and J. J. Dongarra. FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World. In J. Dongarra, P. Kacsuk, and N. Podhorszki, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, number 1908 in Lecture Notes in Computer Science, pages 346--353. Springer Berlin Heidelberg, 2000. ISBN 978--3--540--41010--2, 978--3--540--45255--3."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063443"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/2477183.2478115"},{"key":"e_1_3_2_1_18_1","first-page":"40","article-title":"Distributed Diskless Checkpoint for Large Scale Systems. pages 63--72. IEEE, 2010","author":"Gomez L. A. B.","year":"2010","unstructured":"L. A. B. Gomez , N. Maruyama , F. Cappello , and S. Matsuoka . Distributed Diskless Checkpoint for Large Scale Systems. pages 63--72. IEEE, 2010 . ISBN 978--1--4244--6987--1. 10.1109\/CCGRID. 2010 . 40 . L. A. B. Gomez, N. Maruyama, F. Cappello, and S. Matsuoka. Distributed Diskless Checkpoint for Large Scale Systems. pages 63--72. IEEE, 2010. ISBN 978--1--4244--6987--1. 10.1109\/CCGRID.2010.40.","journal-title":"ISBN 978--1--4244--6987--1. 10.1109\/CCGRID."},{"key":"e_1_3_2_1_19_1","first-page":"494","volume-title":"Journal of Physics: Conference Series","volume":"46","author":"Hargrove P. H.","unstructured":"P. H. Hargrove and J. C. Duell . Berkeley lab checkpoint\/restart (blcr) for linux clusters . In Journal of Physics: Conference Series , volume 46 , page 494 . IOP Publishing, 2006. P. H. Hargrove and J. C. Duell. Berkeley lab checkpoint\/restart (blcr) for linux clusters. In Journal of Physics: Conference Series, volume 46, page 494. IOP Publishing, 2006."},{"issue":"6","key":"e_1_3_2_1_20_1","first-page":"518","article-title":"Algorithm-based fault tolerance for matrix operations. Computers","volume":"100","author":"Huang K.-H.","year":"1984","unstructured":"K.-H. Huang and J. A. Abraham . Algorithm-based fault tolerance for matrix operations. Computers , IEEE Transactions on , 100 ( 6 ): 518 -- 528 , 1984 . K.-H. Huang and J. A. Abraham. Algorithm-based fault tolerance for matrix operations. Computers, IEEE Transactions on, 100 (6): 518--528, 1984.","journal-title":"IEEE Transactions on"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542326"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503226"},{"key":"e_1_3_2_1_23_1","first-page":"1","volume-title":"Storage and Analysis (SC), 2010 International Conference for","author":"Moody A.","year":"2010","unstructured":"A. Moody , G. Bronevetsky , K. Mohror , and B. De Supinski . Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System. In High Performance Computing, Networking , Storage and Analysis (SC), 2010 International Conference for , pages 1 -- 11 , Nov. 2010 . 10.1109\/SC.2010.18. A. Moody, G. Bronevetsky, K. Mohror, and B. De Supinski. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System. In High Performance Computing, Networking, Storage and Analysis (SC), 2010 International Conference for, pages 1--11, Nov. 2010. 10.1109\/SC.2010.18."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/50202.50214"},{"key":"e_1_3_2_1_25_1","unstructured":"A. Petitet R. C. Whaley J. Dongarra and A. Cleary. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http:\/\/www.netlib.org\/benchmark\/hpl\/.  A. Petitet R. C. Whaley J. Dongarra and A. Cleary. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. http:\/\/www.netlib.org\/benchmark\/hpl\/."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/FTCS.1994.315631"},{"issue":"10","key":"e_1_3_2_1_27_1","first-page":"972","article-title":"Diskless checkpointing. Parallel and Distributed Systems","volume":"9","author":"Plank J. S.","year":"1998","unstructured":"J. S. Plank , K. Li , and M. A. Puening . Diskless checkpointing. Parallel and Distributed Systems , IEEE Transactions on , 9 ( 10 ): 972 -- 986 , 1998 . J. S. Plank, K. Li, and M. A. Puening. Diskless checkpointing. Parallel and Distributed Systems, IEEE Transactions on, 9 (10): 972--986, 1998.","journal-title":"IEEE Transactions on"},{"key":"e_1_3_2_1_28_1","first-page":"2014","article-title":"Fault-tolerance techniques for computing at scale","author":"Robert Y.","year":"2014","unstructured":"Y. Robert . Fault-tolerance techniques for computing at scale . CCGrid 2014 , 2014 . Y. Robert. Fault-tolerance techniques for computing at scale. CCGrid2014, 2014.","journal-title":"CCGrid"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2009.4"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2007.370307"},{"key":"e_1_3_2_1_31_1","volume-title":"International Conference on Parallel and Distributed Systems","author":"Wang C.","year":"2011","unstructured":"Wang, Mueller, Engelmann, and Scott]wang2011hybrid C. Wang , F. Mueller , C. Engelmann , and S. L. Scott . Hybrid full\/incremental checkpoint\/restart for mpi jobs in hpc environments . In International Conference on Parallel and Distributed Systems , 2011 . Wang, Mueller, Engelmann, and Scott]wang2011hybridC. Wang, F. Mueller, C. Engelmann, and S. L. Scott. Hybrid full\/incremental checkpoint\/restart for mpi jobs in hpc environments. In International Conference on Parallel and Distributed Systems, 2011."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2011.6152716"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/9780470546345"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600212.2600232"},{"key":"e_1_3_2_1_35_1","volume-title":"A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism. arXiv preprint arXiv:1106.4213","author":"Yao E.","year":"2011","unstructured":"E. Yao , M. Chen , R. Wang , W. Zhang , and G. Tan . A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism. arXiv preprint arXiv:1106.4213 , 2011 . E. Yao, M. Chen, R. Wang, W. Zhang, and G. Tan. A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism. arXiv preprint arXiv:1106.4213, 2011."},{"key":"e_1_3_2_1_36_1","first-page":"978","article-title":"A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism. pages 438--448","author":"Yao E.","year":"2012","unstructured":"E. Yao , R. Wang , M. Chen , G. Tan , and N. Sun . A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism. pages 438--448 . IEEE , May 2012 . ISBN 978 - 971 -4673-0975-2, 978-0-7695-4675-9. 10.1109\/IPDPS.2012.48. E. Yao, R. Wang, M. Chen, G. Tan, and N. Sun. A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism. pages 438--448. IEEE, May 2012. ISBN 978-1-4673-0975-2, 978-0-7695-4675-9. 10.1109\/IPDPS.2012.48.","journal-title":"IEEE"},{"key":"e_1_3_2_1_37_1","first-page":"93","volume-title":"IEEE International Conference on Cluster Computing","author":"Zheng G.","year":"2004","unstructured":"G. Zheng , L. Shi , and L. V. Kale . Ftc-charm++: an in-memory checkpoint-based fault tolerant runtime for charm++and mpi . In IEEE International Conference on Cluster Computing , pages 93 -- 103 , Sept 2004 . G. Zheng, L. Shi, and L. V. Kale. Ftc-charm++: an in-memory checkpoint-based fault tolerant runtime for charm++and mpi. In IEEE International Conference on Cluster Computing, pages 93--103, Sept 2004."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSNW.2012.6264677"}],"event":{"name":"PPoPP '17: 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming","location":"Austin Texas USA","acronym":"PPoPP '17","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3018743.3018745","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3018743.3018745","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:24:10Z","timestamp":1750206250000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3018743.3018745"}},"subtitle":["An In-Memory Checkpoint Method Using Less Space and Its Practice on Fault-Tolerant HPL"],"short-title":[],"issued":{"date-parts":[[2017,1,26]]},"references-count":38,"alternative-id":["10.1145\/3018743.3018745","10.1145\/3018743"],"URL":"https:\/\/doi.org\/10.1145\/3018743.3018745","relation":{"is-identical-to":[{"id-type":"doi","id":"10.1145\/3155284.3018745","asserted-by":"object"}]},"subject":[],"published":{"date-parts":[[2017,1,26]]},"assertion":[{"value":"2017-01-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}