{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T07:04:41Z","timestamp":1772867081308,"version":"3.50.1"},"reference-count":23,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2019,3,12]],"date-time":"2019-03-12T00:00:00Z","timestamp":1552348800000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DE-NA0002374"],"award-info":[{"award-number":["DE-NA0002374"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2019,3]]},"abstract":"<jats:p> Checkpoint restart plays an important role in high-performance computing (HPC) applications, allowing simulation runtime to extend beyond a single job allocation and facilitating recovery from hardware failure. Yet, as machines grow in size and in complexity, traditional approaches to checkpoint restart are becoming prohibitive. Current methods store a subset of the application\u2019s state and exploit the memory hierarchy in the machine. However, as the energy cost of data movement continues to dominate, further reductions in checkpoint size are needed. Lossy compression, which can significantly reduce checkpoint sizes, offers a potential to reduce computational cost in checkpoint restart. This article investigates the use of numerical properties of partial differential equation (PDE) simulations, such as bounds on the truncation error, to evaluate the feasibility of using lossy compression in checkpointing PDE simulations. Restart from a checkpoint with lossy compression is considered for a fail-stop error in two time-dependent HPC application codes: PlasComCM and Nek5000. Results show that error in application variables due to a restart from a lossy compressed checkpoint can be masked by the numerical error in the discretization, leading to increased efficiency in checkpoint restart without influencing overall accuracy in the simulation. <\/jats:p>","DOI":"10.1177\/1094342018762036","type":"journal-article","created":{"date-parts":[[2018,3,13]],"date-time":"2018-03-13T03:22:32Z","timestamp":1520911352000},"page":"397-410","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":34,"title":["Exploring the feasibility of lossy compression for PDE simulations"],"prefix":"10.1177","volume":"33","author":[{"given":"Jon","family":"Calhoun","sequence":"first","affiliation":[{"name":"Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, SC, USA"}]},{"given":"Franck","family":"Cappello","sequence":"additional","affiliation":[{"name":"Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"Luke N","family":"Olson","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA"}]},{"given":"Marc","family":"Snir","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA"}]},{"given":"William D","family":"Gropp","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA"}]}],"member":"179","published-online":{"date-parts":[[2018,3,12]]},"reference":[{"key":"bibr1-1094342018762036","doi-asserted-by":"crossref","unstructured":"Baker AH, Xu H, Dennis JM, (2014) A methodology for evaluating the impact of data compression on climate simulation data. In: Proceedings of the 23rd international symposium on high-performance parallel and distributed computing, HPDC \u201814. New York, USA: ACM, pp. 203\u2013214. ISBN 978-1-4503-2749-7. DOI: 10.1145\/2600212.2600217. Available at: http:\/\/doi.acm.org\/10.1145\/2600212.2600217","DOI":"10.1145\/2600212.2600217"},{"key":"bibr2-1094342018762036","doi-asserted-by":"crossref","unstructured":"Bautista-Gomez L, Tsuboi S, Komatitsch D, (2011) FTI: high performance fault tolerance interface for hybrid systems. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis, SC \u201811. New York, USA: ACM, pp. 32:1\u201332:32. ISBN 978-1-4503-0771-0. DOI: 10.1145\/2063384.2063427. Available at: http:\/\/doi.acm.org\/10.1145\/2063384.2063427.","DOI":"10.1145\/2063384.2063427"},{"key":"bibr3-1094342018762036","volume":"15","author":"Bergman K","year":"2008","journal-title":"Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Technical Report"},{"key":"bibr4-1094342018762036","doi-asserted-by":"publisher","DOI":"10.1109\/DCC.2007.44"},{"key":"bibr5-1094342018762036","doi-asserted-by":"publisher","DOI":"10.1177\/1094342009347767"},{"key":"bibr6-1094342018762036","doi-asserted-by":"crossref","unstructured":"Chen Z, Son SW, Hendrix W, (2014) Numarck: machine learning algorithm for resiliency and checkpointing. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC \u201814. Piscataway, NJ, USA: IEEE Press, pp. 733\u2013744. ISBN 978-1-4799-5500-8. DOI: 10.1109\/SC.2014.65. Available at: http:\/\/dx.doi.org\/10.1109\/SC.2014.65.","DOI":"10.1109\/SC.2014.65"},{"key":"bibr7-1094342018762036","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.11"},{"key":"bibr8-1094342018762036","volume-title":"Lossy Data Compression Reduces Communication Time in Hybrid Time-Parallel Integrators: Technical Report 17-25","author":"Fischer L","year":"2017"},{"key":"bibr9-1094342018762036","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2012.45"},{"key":"bibr10-1094342018762036","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2015.29"},{"key":"bibr11-1094342018762036","doi-asserted-by":"crossref","unstructured":"Lakshminarasimhan S, Shah N, Ethier S, (2011) Euro-Par 2011 Parallel Processing: 17th international conference, Euro-Par 2011, Bordeaux, France, August 29\u2013September 2, 2011, Proceedings, Part I, chapter compressing the incompressible with ISABELA: In-situ reduction of spatiotemporal data. Berlin, Heidelberg: Springer, pp. 366\u2013379. ISBN 978-3-642-23400-2. DOI: 10.1007\/978-3-642-23400-2 34. Available at: http:\/\/dx.doi.org\/10.1007\/978-3-642-23400-2_34.","DOI":"10.1007\/978-3-642-23400-2_34"},{"key":"bibr12-1094342018762036","doi-asserted-by":"crossref","unstructured":"Laney D, Langer S, Weber C, (2013) Assessing the effects of data compression in simulations using physically motivated metrics. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, SC \u201813. New York, USA: ACM. pp. 76:1\u201376:12. ISBN 978-1-4503-2378-9. DOI: 10.1145\/2503210.2503283. Available at: http:\/\/doi.acm.org\/10.1145\/2503210.2503283.","DOI":"10.1145\/2503210.2503283"},{"key":"bibr13-1094342018762036","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346458"},{"key":"bibr14-1094342018762036","doi-asserted-by":"crossref","unstructured":"Lindstrom P, Isenburg M (2006) Fast and efficient compression of floating-point data. IEEE Transactions on Visualization and Computer Graphics 12(5): 1245\u20131250. DOI: 10.1109\/tvcg.2006.143. Available at: http:\/\/dx.doi.org\/10.1109\/tvcg.2006.143.","DOI":"10.1109\/TVCG.2006.143"},{"key":"bibr15-1094342018762036","doi-asserted-by":"crossref","unstructured":"Liu N, Cope J, Carns PH, (2012) On the role of burst buffers in leadership-class storage systems. In: IEEE 28th symposium on mass storage systems and technologies, MSST 2012, April 16-20, 2012, Asilomar Conference Grounds. CA, USA: Pacific Grove, pp. 1\u201311. DOI: 10.1109\/MSST.2012. 6232369. Available at: http:\/\/dx.doi.org\/10.1109\/MSST.2012.6232369.","DOI":"10.1109\/MSST.2012.6232369"},{"key":"bibr16-1094342018762036","doi-asserted-by":"crossref","unstructured":"Luu H, Winslett M, Gropp W, (2015) A multiplatform study of I\/O behavior on petascale supercomputers. In: Proceedings of the 24th international symposium on high-performance parallel and distributed computing, HPDC \u201815. New York, USA: ACM, pp. 33\u201344. ISBN 978-1-4503-3550-8. DOI: 10.1145\/2749246.2749269. Available at: http:\/\/doi.acm.org\/10.1145\/2749246.2749269.","DOI":"10.1145\/2749246.2749269"},{"key":"bibr17-1094342018762036","doi-asserted-by":"crossref","unstructured":"Moody A, Bronevetsky G, Mohror K, (2010) Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: Proceedings of the 2010 ACM\/IEEE international conference for high performance computing, networking, storage and analysis, SC \u201810. Washington, DC, USA: IEEE Computer Society, pp. 1\u201311. ISBN 978-1-4244-7559-9. DOI: 10.1109\/SC.2010.18. Available at: http:\/\/dx.doi.org\/10.1109\/SC.2010.18.","DOI":"10.1109\/SC.2010.18"},{"key":"bibr18-1094342018762036","volume-title":"Poster session of the 2014 ACM\/IEEE international conference for high performance computing, networking, storage and analysis, SC \u201814","author":"Ni X","year":"2014"},{"key":"bibr19-1094342018762036","unstructured":"Sardashti S (2015) Using compression for energy-optimized memory hierarchies. PhD Thesis, University of Wisconsin, Madison."},{"key":"bibr20-1094342018762036","doi-asserted-by":"crossref","unstructured":"Sasaki N, Sato K, Endo T, (2015) Exploration of lossy compression for application-level checkpoint\/restart. In: Proceedings of the 2015 IEEE international parallel and distributed processing symposium, IPDPS \u201815. Washington, DC, USA: IEEE Computer Society, pp. 914\u2013922. ISBN 978-1-4799-8649-1. DOI: 10.1109\/IPDPS.2015.67. Available at: http:\/\/dx.doi.org\/10.1109\/IPDPS.2015.67.","DOI":"10.1109\/IPDPS.2015.67"},{"key":"bibr21-1094342018762036","doi-asserted-by":"crossref","unstructured":"Shalf J, Dosanjh S, Morrison J (2010) Exascale computing technology challenges. In: High Performance Computing for Computational Science\u2014VECPAR 2010. Berlin, Heidelberg: Springer, pp. 1\u201325. Available at: http:\/\/dl.acm.org\/citation.cfm?id=1964238.1964240","DOI":"10.1007\/978-3-642-19328-6_1"},{"key":"bibr22-1094342018762036","doi-asserted-by":"crossref","unstructured":"Son SW, Chen Z, Hendrix W, (2014) Data compression for the exascale computing era\u2014survey. Supercomputing Frontiers and Innovations 1(2). Available at: http:\/\/superfri.org\/superfri\/article\/view\/13.","DOI":"10.14529\/jsfi140205"},{"key":"bibr23-1094342018762036","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2017.115"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018762036","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1094342018762036","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018762036","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342018762036","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T15:14:23Z","timestamp":1740755663000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342018762036"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,3,12]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,3]]}},"alternative-id":["10.1177\/1094342018762036"],"URL":"https:\/\/doi.org\/10.1177\/1094342018762036","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,3,12]]}}}