{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T06:38:47Z","timestamp":1770273527867,"version":"3.49.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,7,24]],"date-time":"2023-07-24T00:00:00Z","timestamp":1690156800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["CNS-2143256"],"award-info":[{"award-number":["CNS-2143256"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,7,31]]},"abstract":"<jats:p>\n            Real-time systems are susceptible to adversarial factors such as faults and attacks, leading to severe consequences. This paper presents an optimal checkpoint scheme to bolster fault resilience in real-time systems, addressing both logical consistency and timing correctness. First, we partition message-passing processes into a\n            <jats:bold>directed acyclic graph (DAG)<\/jats:bold>\n            based on their dependencies, ensuring checkpoint logical consistency. Then, we identify the DAG\u2019s critical path, representing the longest sequential path, and analyze the optimal checkpoint strategy along this path to minimize overall execution time, including checkpointing overhead. Upon fault detection, the system rolls back to the nearest valid checkpoints for recovery. Our algorithm derives the optimal checkpoint count and intervals, and we evaluate its performance through extensive simulations and a case study. Results show a 99.97% and 67.86% reduction in execution time compared to checkpoint-free systems in simulations and the case study, respectively. Moreover, our proposed strategy outperforms prior work and baseline methods, increasing deadline achievement rates by 31.41% and 2.92% for small-scale tasks and 78.53% and 4.15% for large-scale tasks.\n          <\/jats:p>","DOI":"10.1145\/3603172","type":"journal-article","created":{"date-parts":[[2023,6,1]],"date-time":"2023-06-01T11:00:15Z","timestamp":1685617215000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Optimal Checkpointing Strategy for Real-time Systems with Both Logical and Timing Correctness"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3708-9056","authenticated-orcid":false,"given":"Lin","family":"Zhang","sequence":"first","affiliation":[{"name":"Syracuse University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8961-4302","authenticated-orcid":false,"given":"Zifan","family":"Wang","sequence":"additional","affiliation":[{"name":"Syracuse University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2174-1620","authenticated-orcid":false,"given":"Fanxin","family":"Kong","sequence":"additional","affiliation":[{"name":"Syracuse University, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,7,24]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSEC.2020.3002851"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compind.2018.04.017"},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1109\/FTCS.1999.781058","volume-title":"Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No. 99CB36352)","author":"Alvisi Lorenzo","year":"1999","unstructured":"Lorenzo Alvisi, Elmootazbellah Elnozahy, Sriram Rao, Syed Amir Husain, and Asanka De Mel. 1999. An analysis of communication induced checkpointing. In Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No. 99CB36352). IEEE, 242\u2013249."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3188568"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.rser.2015.12.193"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/FTCS.1997.614079"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/71.737697"},{"key":"e_1_3_1_9_2","article-title":"Adversarial attacks and defences: A survey","author":"Chakraborty Anirban","year":"2018","unstructured":"Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, and Debdeep Mukhopadhyay. 2018. Adversarial attacks and defences: A survey. arXiv preprint arXiv:1810.00069 (2018).","journal-title":"arXiv preprint arXiv:1810.00069"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cie.2021.107534"},{"key":"e_1_3_1_11_2","unstructured":"CRIU. 2022. Checkpoint\/Restore In Userspace (CRIU). https:\/\/criu.org\/Main_Page."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2503210.2503217"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/JAS.2022.105548"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/1629435.1629463"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/568522.568525"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2004.15"},{"key":"e_1_3_1_17_2","article-title":"Resilience of cyber-physical systems","author":"Flammini Francesco","year":"2019","unstructured":"Francesco Flammini. 2019. Resilience of cyber-physical systems. Springer (2019).","journal-title":"Springer"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/359511.359531"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.95"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTCSA.2013.6732204"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2600595"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cps.2016.0019"},{"key":"e_1_3_1_23_2","first-page":"1","volume-title":"2008 IEEE International Symposium on Parallel and Distributed Processing","author":"Ho Justin C. Y.","year":"2008","unstructured":"Justin C. Y. Ho, Cho-Li Wang, and Francis C. M. Lau. 2008. Scalable group-based checkpoint\/restart for large-scale message-passing systems. In 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1\u201312."},{"key":"e_1_3_1_24_2","article-title":"A survey of fault tolerance mechanisms and checkpoint\/restart implementations for high performance computing systems.","author":"Levy Bran Selic Shiping Chen Ifeanyi P. Egwutuoha, David","year":"2013","unstructured":"Bran Selic Shiping Chen Ifeanyi P. Egwutuoha, David Levy. 2013. A survey of fault tolerance mechanisms and checkpoint\/restart implementations for high performance computing systems. The Journal of Supercomputing (2013).","journal-title":"The Journal of Supercomputing"},{"key":"e_1_3_1_25_2","volume-title":"Design, Automation and Test in Europe","author":"Izosimov Viacheslav","year":"2005","unstructured":"Viacheslav Izosimov, Paul Pop, Petru Eles, and Zebo Peng. 2005. Design optimization of time-and cost-constrained fault-tolerant distributed embedded systems. In Design, Automation and Test in Europe. IEEE."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1177\/1748006X19893569"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/RELDIS.2002.1180181"},{"key":"e_1_3_1_28_2","volume-title":"Art of Computer Programming, volume 2: Seminumerical Algorithms","author":"Knuth Donald E.","year":"2014","unstructured":"Donald E. Knuth. 2014. In Art of Computer Programming, volume 2: Seminumerical Algorithms. Addison-Wesley Professional."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCPS.2018.00011"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/24.974127"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2004.1261830"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/40.735942"},{"key":"e_1_3_1_33_2","volume-title":"Design, Automation and Test in Europe","author":"Pinello Claudio","year":"2004","unstructured":"Claudio Pinello, Luca P. Carloni, and Alberto L. Sangiovanni-Vincentelli. 2004. Fault-tolerant deployment of embedded software for cost-sensitive real-time feedback-control applications. In Design, Automation and Test in Europe. IEEE."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.5555\/230303"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1026589200419"},{"key":"e_1_3_1_36_2","first-page":"1","volume-title":"2020 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","author":"Sahoo Siva Satyendra","year":"2020","unstructured":"Siva Satyendra Sahoo, Bharadwaj Veeravalli, and Akash Kumar. 2020. Markov chain-based modeling and analysis of checkpointing with rollback recovery for efficient DSE in soft real-time systems. In 2020 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, 1\u20136."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2015.2512839"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1987.5009472"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/71.382324"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2699401"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.micpro.2020.103201"},{"key":"e_1_3_1_42_2","doi-asserted-by":"crossref","unstructured":"Yi Luo and D. Manivannan. 2011. Theoretical and experimental evaluation of communication-induced checkpointing protocols in FE and FLazy-E families. Performance Evaluation 68 5 (2011) 429\u2013445.","DOI":"10.1016\/j.peva.2011.01.005"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/2968455.2968515"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS49844.2020.00028"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477010"},{"key":"e_1_3_1_46_2","volume-title":"2023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS)","author":"Zhang Lin","year":"2023","unstructured":"Lin Zhang, Kaustubh Sridhar, Mengyu Liu, Pengyuan Lu, Xin Chen, Fanxin Kong, Oleg Sokolsky, and Insup Lee. 2023. Real-time data-predictive attack-recovery for complex cyber-physical systems. In 2023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS)."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530555"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/DATE.2004.1269050"},{"key":"e_1_3_1_49_2","volume-title":"12th IEEE Real- Time and Embedded Technology and Applications Symposium","author":"Zhu Dakai","year":"2006","unstructured":"Dakai Zhu. 2006. Reliability-aware dynamic energy management in dependable embedded real-time systems. In 12th IEEE Real- Time and Embedded Technology and Applications Symposium."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603172","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3603172","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3603172","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:49:10Z","timestamp":1750286950000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603172"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,24]]},"references-count":48,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,7,31]]}},"alternative-id":["10.1145\/3603172"],"URL":"https:\/\/doi.org\/10.1145\/3603172","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,24]]},"assertion":[{"value":"2022-11-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-12","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}