{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T07:07:43Z","timestamp":1761808063787,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2016,7,20]],"date-time":"2016-07-20T00:00:00Z","timestamp":1468972800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2016,8,8]]},"abstract":"<jats:p>In this article, we combine the traditional checkpointing and rollback recovery strategies with verification mechanisms to cope with both fail-stop and silent errors. The objective is to minimize makespan and\/or energy consumption. For divisible load applications, we use first-order approximations to find the optimal checkpointing period to minimize execution time, with an additional verification mechanism to detect silent errors before each checkpoint, hence extending the classical formula by Young and Daly for fail-stop errors only. We further extend the approach to include intermediate verifications, and to consider a bicriteria problem involving both time and energy (linear combination of execution time and energy consumption). Then, we focus on application workflows whose dependence graph is a linear chain of tasks. Here, we determine the optimal checkpointing and verification locations, with or without intermediate verifications, for the bicriteria problem. Rather than using a single speed during the whole execution, we further introduce a new execution scenario, which allows for changing the execution speed via Dynamic Voltage and Frequency Scaling (DVFS). In this latter scenario, we determine the optimal checkpointing and verification locations, as well as the optimal speed pairs for each task segment between any two consecutive checkpoints. Finally, we conduct an extensive set of simulations to support the theoretical study, and to assess the performance of each algorithm, showing that the best overall performance is achieved under the most flexible scenario using intermediate verifications and different speeds.<\/jats:p>","DOI":"10.1145\/2897189","type":"journal-article","created":{"date-parts":[[2016,7,21]],"date-time":"2016-07-21T15:13:24Z","timestamp":1469114004000},"page":"1-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Assessing General-Purpose Algorithms to Cope with Fail-Stop and Silent Errors"],"prefix":"10.1145","volume":"3","author":[{"given":"Anne","family":"Benoit","sequence":"first","affiliation":[{"name":"\u00c9cole Normale Sup\u00e9rieure de Lyon, CNRS &amp; INRIA, France"}]},{"given":"Aur\u00e9lien","family":"Cavelan","sequence":"additional","affiliation":[{"name":"\u00c9cole Normale Sup\u00e9rieure de Lyon, CNRS &amp; INRIA, France"}]},{"given":"Yves","family":"Robert","sequence":"additional","affiliation":[{"name":"\u00c9cole Normale Sup\u00e9rieure de Lyon, CNRS &amp; INRIA, France, and University of Tennessee Knoxville"}]},{"given":"Hongyang","family":"Sun","sequence":"additional","affiliation":[{"name":"\u00c9cole Normale Sup\u00e9rieure de Lyon, CNRS &amp; INRIA, France"}]}],"member":"320","published-online":{"date-parts":[[2016,7,20]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1290672.1290686"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10009-012-0263-9"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/PRDC.2013.10"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/HiPC.2012.6507482"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1206035.1206038"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342014532297"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2008.12.002"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063428"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375527.1375552"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/214451.214456"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442533"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1134241.1708449"},{"key":"e_1_2_1_13_1","first-page":"1","article-title":"Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs","volume":"61","author":"Das A.","year":"2014","unstructured":"A. Das , A. Kumar , B. Veeravalli , C. Bolchini , and A. Miele . 2014 . Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs . In Proceedings of the Conference on Design, Automation & Test in Europe (DATE). 61 : 1 -- 61 :6. A. Das, A. Kumar, B. Veeravalli, C. Bolchini, and A. Miele. 2014. Combined DVFS and mapping exploration for lifetime and soft-error susceptibility improvement in MPSoCs. In Proceedings of the Conference on Design, Automation & Test in Europe (DATE). 61:1--61:6.","journal-title":"Proceedings of the Conference on Design, Automation & Test in Europe (DATE)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/IRPS.2011.5784522"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/375827.375843"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2318857.2254778"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2012.56"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/568522.568525"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/872035.872088"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/957717.957772"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389102"},{"key":"e_1_2_1_22_1","volume-title":"Heroux and Mark Hoemmen","author":"Michael","year":"2011","unstructured":"Michael A. Heroux and Mark Hoemmen . 2011 . Fault-Tolerant Iterative Methods via Selective Reliability. Research report SAND2011-3915 C. Sandia National Laboratories . Michael A. Heroux and Mark Hoemmen. 2011. Fault-Tolerant Iterative Methods via Selective Reliability. Research report SAND2011-3915 C. Sandia National Laboratories."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2005.3"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1984.1676475"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2248487.2150989"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Thomas H\u00e9rault and Yves Robert (Eds.). 2015. Fault-Tolerance Techniques for High-Performance Computing. Springer Verlag.   Thomas H\u00e9rault and Yves Robert (Eds.). 2015. Fault-Tolerance Techniques for High-Performance Computing. Springer Verlag.","DOI":"10.1007\/978-3-319-20943-2"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33486-3_31"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465813.2465821"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.62.0200"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2010.18"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503266"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/16.278509"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2006.22"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ITHERM.2008.4544393"},{"key":"e_1_2_1_35_1","volume-title":"Ali Javadzadeh Boloori, and Javid Taheri.","author":"Rizvandi Nikzad Babaii","year":"2012","unstructured":"Nikzad Babaii Rizvandi , Albert Y. Zomaya , Young Choon Lee , Ali Javadzadeh Boloori, and Javid Taheri. 2012 . Multiple frequency selection in DVFS-enabled processors to minimize energy consumption. In Energy-Efficient Distributed Computing Systems, A. Y. Zomaya and Y. C. Lee (Eds.). John Wiley & Sons , Inc., Hoboken, NJ. Nikzad Babaii Rizvandi, Albert Y. Zomaya, Young Choon Lee, Ali Javadzadeh Boloori, and Javid Taheri. 2012. Multiple frequency selection in DVFS-enabled processors to minimize energy consumption. In Energy-Efficient Distributed Computing Systems, A. Y. Zomaya and Y. C. Lee (Eds.). John Wiley & Sons, Inc., Hoboken, NJ."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2530268.2530272"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503228"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304588"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1137\/0213039"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/795662.796264"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/361147.361115"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2008.4751927"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD.2004.1382539"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.401.0051"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/4.658626"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.401.0003"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2897189","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2897189","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:39:02Z","timestamp":1750221542000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2897189"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,7,20]]},"references-count":46,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2016,8,8]]}},"alternative-id":["10.1145\/2897189"],"URL":"https:\/\/doi.org\/10.1145\/2897189","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2016,7,20]]},"assertion":[{"value":"2014-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-07-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}