{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:51:04Z","timestamp":1750308664151,"version":"3.41.0"},"reference-count":72,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2007,8,1]],"date-time":"2007-08-01T00:00:00Z","timestamp":1185926400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput. Syst."],"published-print":{"date-parts":[[2007,8]]},"abstract":"<jats:p>Many applications demand availability. Unfortunately, software failures greatly reduce system availability. Prior work on surviving software failures suffers from one or more of the following limitations: required application restructuring, inability to address deterministic software bugs, unsafe speculation on program execution, and long recovery time.<\/jats:p>\n          <jats:p>\n            This paper proposes an innovative\n            <jats:italic>safe<\/jats:italic>\n            technique, called Rx, which can quickly recover programs from many types of software bugs, both deterministic and nondeterministic. Our idea, inspired from allergy treatment in real life, is to rollback the program to a recent checkpoint upon a software failure, and then to reexecute the program in a\n            <jats:italic>modified<\/jats:italic>\n            environment. We base this idea on the observation that many bugs are correlated with the execution environment, and therefore can be avoided by removing the \u201callergen\u201d from the environment. Rx requires few to no modifications to applications and provides programmers with additional feedback for bug diagnosis.\n          <\/jats:p>\n          <jats:p>We have implemented Rx on Linux. Our experiments with five server applications that contain seven bugs of various types show that Rx can survive six out of seven software failures and provide transparent fast recovery within 0.017--0.16 seconds, 21--53 times faster than the whole program restart approach for all but one case (CVS). In contrast, the two tested alternatives, a whole program restart approach and a simple rollback and reexecution without environmental changes, cannot successfully recover the four servers (Squid, Apache, CVS, and ypserv) that contain deterministic bugs, and have only a 40% recovery rate for the server (MySQL) that contains a nondeterministic concurrency bug. Additionally, Rx's checkpointing system is lightweight, imposing small time and space overheads.<\/jats:p>","DOI":"10.1145\/1275517.1275519","type":"journal-article","created":{"date-parts":[[2007,9,14]],"date-time":"2007-09-14T13:44:55Z","timestamp":1189777495000},"page":"7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":38,"title":["Rx"],"prefix":"10.1145","volume":"25","author":[{"given":"Feng","family":"Qin","sequence":"first","affiliation":[{"name":"The Ohio State University, Columbus, OH"}]},{"given":"Joseph","family":"Tucek","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, IL"}]},{"given":"Yuanyuan","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, IL"}]},{"given":"Jagadeesan","family":"Sundaresan","sequence":"additional","affiliation":[{"name":"Citadel Investment Group, Chicago, IL"}]}],"member":"320","published-online":{"date-parts":[[2007,8]]},"reference":[{"volume-title":"Proceedings of the 4th International WWW Conference.","author":"Abrams M.","key":"e_1_2_1_1_1","unstructured":"Abrams , M. , Standridge , C. R. , Abdulla , G. , Williams , S. , and Fox , E. A . 1995. Caching proxies: Limitations and potentials . In Proceedings of the 4th International WWW Conference. Abrams, M., Standridge, C. R., Abdulla, G., Williams, S., and Fox, E. A. 1995. Caching proxies: Limitations and potentials. In Proceedings of the 4th International WWW Conference."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/248052.248061"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.2185"},{"volume-title":"Proceedings of the International Conference on Dependable Systems and Networks.","author":"Amza C.","key":"e_1_2_1_4_1","unstructured":"Amza , C. , Cox , A. , and Zwaenepoel , W . 2000. Data replication strategies for fault tolerance and availability on commodity clusters . In Proceedings of the International Conference on Dependable Systems and Networks. Amza, C., Cox, A., and Zwaenepoel, W. 2000. Data replication strategies for fault tolerance and availability on commodity clusters. In Proceedings of the International Conference on Dependable Systems and Networks."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1985.231893"},{"volume-title":"Proceedings of the 1st International Computer Software and Applications Conference.","author":"Avizienis A.","key":"e_1_2_1_6_1","unstructured":"Avizienis , A. and Chen , L . 1977. On the implementation of N-version programming for software fault tolerance during execution . In Proceedings of the 1st International Computer Software and Applications Conference. Avizienis, A. and Chen, L. 1977. On the implementation of N-version programming for software fault tolerance during execution. In Proceedings of the 1st International Computer Software and Applications Conference."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/800216.806587"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1133981.1134000"},{"key":"e_1_2_1_9_1","unstructured":"Birman K. P. 1996. Building Secure and Reliable Network Applications. Manning ISBN: 1-884777-29-5 Chapter 19.   Birman K. P. 1996. Building Secure and Reliable Network Applications. Manning ISBN: 1-884777-29-5 Chapter 19."},{"volume-title":"Proceedings of the International Computer Performance and Dependability Symposium.","author":"Bobbio A.","key":"e_1_2_1_10_1","unstructured":"Bobbio , A. and Sereno , M . 1998. Fine grained software rejuvenation models . In Proceedings of the International Computer Performance and Dependability Symposium. Bobbio, A. and Sereno, M. 1998. Fine grained software rejuvenation models. In Proceedings of the International Computer Performance and Dependability Symposium."},{"volume-title":"Proceedings of the International Conference on Autonomic Computing.","author":"Bohra A.","key":"e_1_2_1_11_1","unstructured":"Bohra , A. , Neamtiu , I. , Gallard , P. , Sultan , F. , and Iftode , L . 2004. Remote repair of operating system state using backdoors . In Proceedings of the International Conference on Autonomic Computing. Bohra, A., Neamtiu, I., Gallard, P., Sultan, F., and Iftode, L. 2004. Remote repair of operating system state using backdoors. In Proceedings of the International Conference on Autonomic Computing."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/800217.806617"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/58564.58565"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/225535.225538"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-024X(200006)30:7%3C775::AID-SPE309%3E3.0.CO;2-H"},{"volume-title":"Proceedings of the International Conference on Dependable Systems and Networks.","author":"Candea G.","key":"e_1_2_1_16_1","unstructured":"Candea , G. , Cutler , J. , Fox , A. , Doshi , R. , Garg , P. , and Gowda , R . 2002. Reducing recovery time in a small recursively restartable system . In Proceedings of the International Conference on Dependable Systems and Networks. Candea, G., Cutler, J., Fox, A., Doshi, R., Garg, P., and Gowda, R. 2002. Reducing recovery time in a small recursively restartable system. In Proceedings of the International Conference on Dependable Systems and Networks."},{"volume-title":"Proceedings of the 6th Symposium on Operating System Design and Implementation.","author":"Candea G.","key":"e_1_2_1_17_1","unstructured":"Candea , G. , Kawamoto , S. , Fujiki , Y. , Friedman , G. , and Fox , A . 2004. Microreboot---A technique for cheap recovery . In Proceedings of the 6th Symposium on Operating System Design and Implementation. Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., and Fox, A. 2004. Microreboot---A technique for cheap recovery. In Proceedings of the 6th Symposium on Operating System Design and Implementation."},{"volume-title":"Proceedings of the 3rd Symposium on Operating System Design and Implementation.","author":"Castro M.","key":"e_1_2_1_18_1","unstructured":"Castro , M. and Liskov , B . 1999. Practical byzantine fault tolerance . In Proceedings of the 3rd Symposium on Operating System Design and Implementation. Castro, M. and Liskov, B. 1999. Practical byzantine fault tolerance. In Proceedings of the 3rd Symposium on Operating System Design and Implementation."},{"volume-title":"Proceedings of the 4th Symposium on Operating System Design and Implementation.","author":"Castro M.","key":"e_1_2_1_19_1","unstructured":"Castro , M. and Liskov , B . 2000. Proactive recovery in a Byzantine-Fault-Tolerant system . In Proceedings of the 4th Symposium on Operating System Design and Implementation. Castro, M. and Liskov, B. 2000. Proactive recovery in a Byzantine-Fault-Tolerant system. In Proceedings of the 4th Symposium on Operating System Design and Implementation."},{"key":"e_1_2_1_20_1","unstructured":"CERT\/CC. Advisories. http:\/\/www.cert.org\/advisories\/.  CERT\/CC. Advisories. http:\/\/www.cert.org\/advisories\/."},{"volume-title":"Proceedings of the International Conference on Dependable Systems and Networks.","author":"Chandra S.","key":"e_1_2_1_21_1","unstructured":"Chandra , S. and Chen , P. M . 2000. Whither generic recovery from application faults? A fault study using open-source software . In Proceedings of the International Conference on Dependable Systems and Networks. Chandra, S. and Chen, P. M. 2000. Whither generic recovery from application faults? A fault study using open-source software. In Proceedings of the International Conference on Dependable Systems and Networks."},{"volume-title":"Proceedings of the 13th International Symposium on Software Reliability Engineering.","author":"Chandra S.","key":"e_1_2_1_22_1","unstructured":"Chandra , S. and Chen , P. M . 2002. The impact of recovery mechanisms on the likelihood of saving corrupted state . In Proceedings of the 13th International Symposium on Software Reliability Engineering. Chandra, S. and Chen, P. M. 2002. The impact of recovery mechanisms on the likelihood of saving corrupted state. In Proceedings of the 13th International Symposium on Software Reliability Engineering."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/509593.509626"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/781131.781157"},{"volume-title":"Proceedings of the 7th USENIX Security Symposium.","author":"Cowan C.","key":"e_1_2_1_25_1","unstructured":"Cowan , C. , Pu , C. , Maier , D. , Walpole , J. , Bakke , P. , Beattie , S. , Grier , A. , Wagle , P. , Zhang , Q. , and Hinton , H . 1998. StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks . In Proceedings of the 7th USENIX Security Symposium. Cowan, C., Pu, C., Maier, D., Walpole, J., Bakke, P., Beattie, S., Grier, A., Wagle, P., Zhang, Q., and Hinton, H. 1998. StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks. In Proceedings of the 7th USENIX Security Symposium."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/1060289.1060309"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/568522.568525"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/151257.151259"},{"volume-title":"Proceedings of the Annual Conference on Computer Assurance.","author":"Garg S.","key":"e_1_2_1_29_1","unstructured":"Garg , S. , Puliafito , A. , Telek , M. , and Trivedi , K. S . 1997. On the analysis of software rejuvenation policies . In Proceedings of the Annual Conference on Computer Assurance. Garg, S., Puliafito, A., Telek, M., and Trivedi, K. S. 1997. On the analysis of software rejuvenation policies. In Proceedings of the Annual Conference on Computer Assurance."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 5th Symposium on Reliable Distributed Systems.","author":"Gray J.","year":"1986","unstructured":"Gray , J. 1986 . Why do computers stop and what can be done about it? In Proceedings of the 5th Symposium on Reliable Distributed Systems. Gray, J. 1986. Why do computers stop and what can be done about it? In Proceedings of the 5th Symposium on Reliable Distributed Systems."},{"volume-title":"Proceedings of the International Conference on Dependable Systems and Networks.","author":"Gu W.","key":"e_1_2_1_31_1","unstructured":"Gu , W. , Kalbarczyk , Z. , Iyer , R. K. , and Yang , Z . -Y. 2003. Characterization of Linux kernel behavior under errors . In Proceedings of the International Conference on Dependable Systems and Networks. Gu, W., Kalbarczyk, Z., Iyer, R. K., and Yang, Z.-Y. 2003. Characterization of Linux kernel behavior under errors. In Proceedings of the International Conference on Dependable Systems and Networks."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/512529.512539"},{"volume-title":"Proceedings of the USENIX Winter Technical Conference.","author":"Hasting R.","key":"e_1_2_1_33_1","unstructured":"Hasting , R. and Joyce , B . 1992. Purify: Fast detection of memory leaks and access errors . In Proceedings of the USENIX Winter Technical Conference. Hasting, R. and Joyce, B. 1992. Purify: Fast detection of memory leaks and access errors. In Proceedings of the USENIX Winter Technical Conference."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/781131.781150"},{"volume-title":"Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing.","author":"Huang Y.","key":"e_1_2_1_35_1","unstructured":"Huang , Y. , Kintala , C. , Kolettis , N. , and Fulton , N. D . 1995. Software rejuvenation: Analysis, module and applications . In Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing. Huang, Y., Kintala, C., Kolettis, N., and Fulton, N. D. 1995. Software rejuvenation: Analysis, module and applications. In Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/62546.62575"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/0196-6774(90)90022-7"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/99163.99173"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1181309.1181314"},{"volume-title":"Proceedings of the 4th Symposium on Operating System Design and Implementation.","author":"Lowell D. E.","key":"e_1_2_1_40_1","unstructured":"Lowell , D. E. , Chandra , S. , and Chen , P. M . 2000. Exploring failure transparency and the limits of generic recovery . In Proceedings of the 4th Symposium on Operating System Design and Implementation. Lowell, D. E., Chandra, S., and Chen, P. M. 2000. Exploring failure transparency and the limits of generic recovery. In Proceedings of the 4th Symposium on Operating System Design and Implementation."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/268998.266665"},{"key":"e_1_2_1_42_1","unstructured":"Lowell D. E. and Chen P. M. 1998. Discount checking: Transparent low-overhead recovery for general applications. Tech. rep. CSE-TR-410-99 University of Michigan.  Lowell D. E. and Chen P. M. 1998. Discount checking: Transparent low-overhead recovery for general applications. Tech. rep. CSE-TR-410-99 University of Michigan."},{"volume-title":"Proceedings of the 1st International WWW Conference.","author":"Luotonen A.","key":"e_1_2_1_43_1","unstructured":"Luotonen , A. and Altis , K . 1994. World-wide web proxies . In Proceedings of the 1st International WWW Conference. Luotonen, A. and Altis, K. 1994. World-wide web proxies. In Proceedings of the 1st International WWW Conference."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/306225.306235"},{"key":"e_1_2_1_45_1","unstructured":"Patterson D. Brown A. Broadwell P. Candea G. Chen M. Cutler J. Enriquez P. Fox A. Kiciman E. Merzbacher M. Oppenheimer D. Sastry N. Tetzlaff W. Traupman J. and Treuhaft N. 2002. Recovery oriented computing (ROC): Motivation definition techniques and case studies. Tech. rep. UCB\/\/CSD-02-1175 University of California Berkeley.   Patterson D. Brown A. Broadwell P. Candea G. Chen M. Cutler J. Enriquez P. Fox A. Kiciman E. Merzbacher M. Oppenheimer D. Sastry N. Tetzlaff W. Traupman J. and Treuhaft N. 2002. Recovery oriented computing (ROC): Motivation definition techniques and case studies. Tech. rep. UCB\/\/CSD-02-1175 University of California Berkeley."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.730527"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859632"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2005.29"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1975.6312842"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/356725.356729"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 6th Symposium on Operating System Design and Implementation.","author":"Rinard M.","year":"2004","unstructured":"Rinard , M. , Cadar , C. , Dumitran , D. , Roy , D. M. , Leu , T. , and Beebee , Jr., W. S. 2004 . Enhancing server availability and security through failure-oblivious computing . In Proceedings of the 6th Symposium on Operating System Design and Implementation. Rinard, M., Cadar, C., Dumitran, D., Roy, D. M., Leu, T., and Beebee, Jr., W. S. 2004. Enhancing server availability and security through failure-oblivious computing. In Proceedings of the 6th Symposium on Operating System Design and Implementation."},{"key":"e_1_2_1_52_1","unstructured":"Rocco E. and Igou B. 2001. The critical role of high-availability infrastructure support services.  Rocco E. and Igou B. 2001. The critical role of high-availability infrastructure support services."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/502034.502037"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/231379.231432"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/319151.319159"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/265924.265927"},{"volume-title":"Assessing the costs of application downtime","author":"Scott D.","key":"e_1_2_1_57_1","unstructured":"Scott , D. 1998. Assessing the costs of application downtime . Gartner Group . Scott, D. 1998. Assessing the costs of application downtime. Gartner Group."},{"volume-title":"Proceedings of the USENIX Annual Technical Conference.","author":"Sidiroglou S.","key":"e_1_2_1_58_1","unstructured":"Sidiroglou , S. , Locasto , M. E. , Boyd , S. W. , and Keromytis , A. D . 2005. Building a reactive immune system for software services . In Proceedings of the USENIX Annual Technical Conference. Sidiroglou, S., Locasto, M. E., Boyd, S. W., and Keromytis, A. D. 2005. Building a reactive immune system for software services. In Proceedings of the USENIX Annual Technical Conference."},{"volume-title":"Proceedings of the USENIX Annual Technical Conference.","author":"Srinivasan S.","key":"e_1_2_1_59_1","unstructured":"Srinivasan , S. , Andrews , C. , Kandula , S. , and Zhou , Y . 2004. Flashback: A light-weight extension for rollback and deterministic replay for software debugging . In Proceedings of the USENIX Annual Technical Conference. Srinivasan, S., Andrews, C., Kandula, S., and Zhou, Y. 2004. Flashback: A light-weight extension for rollback and deterministic replay for software debugging. In Proceedings of the USENIX Annual Technical Conference."},{"volume-title":"Proceedings of the 11th USENIX Security Symposium.","author":"Staniford S.","key":"e_1_2_1_60_1","unstructured":"Staniford , S. , Paxson , V. , and Weaver , N . 2002. How to own the internet in your spare time . In Proceedings of the 11th USENIX Security Symposium. Staniford, S., Paxson, V., and Weaver, N. 2002. How to own the internet in your spare time. In Proceedings of the 11th USENIX Security Symposium."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1571-0661(04)80582-6"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3959.3962"},{"volume-title":"Proceedings of the 21th Annual International Symposium on Fault-Tolerant Computing.","author":"Sullivan M.","key":"e_1_2_1_63_1","unstructured":"Sullivan , M. and Chillarege , R . 1991. Software defects and their impact on system availability---A study of field failures in operating systems . In Proceedings of the 21th Annual International Symposium on Fault-Tolerant Computing. Sullivan, M. and Chillarege, R. 1991. Software defects and their impact on system availability---A study of field failures in operating systems. In Proceedings of the 21th Annual International Symposium on Fault-Tolerant Computing."},{"volume-title":"Proceedings of the 22th Annual International Symposium on Fault-Tolerant Computing.","author":"Sullivan M.","key":"e_1_2_1_64_1","unstructured":"Sullivan , M. and Chillarege , R . 1992. A comparison of software defects in database management systems and operating systems . In Proceedings of the 22th Annual International Symposium on Fault-Tolerant Computing. Sullivan, M. and Chillarege, R. 1992. A comparison of software defects in database management systems and operating systems. In Proceedings of the 22th Annual International Symposium on Fault-Tolerant Computing."},{"volume-title":"Proceedings of the 6th Symposium on Operating System Design and Implementation.","author":"Swift M. M.","key":"e_1_2_1_65_1","unstructured":"Swift , M. M. , Annamalai , M. , Bershad , B. N. , and Levy , H. M . 2004. Recovering device drivers . In Proceedings of the 6th Symposium on Operating System Design and Implementation. Swift, M. M., Annamalai, M., Bershad, B. N., and Levy, H. M. 2004. Recovering device drivers. In Proceedings of the 6th Symposium on Operating System Design and Implementation."},{"key":"e_1_2_1_66_1","volume-title":"Webstone: The first generation in http server benchmarking","author":"Trent G.","year":"1995","unstructured":"Trent , G. and Sake , M . 1995 . Webstone: The first generation in http server benchmarking . http:\/\/www.cs.virginia.edu\/~zaher\/classes\/cs851\/papers\/webstone. Trent, G. and Sake, M. 1995. Webstone: The first generation in http server benchmarking. http:\/\/www.cs.virginia.edu\/~zaher\/classes\/cs851\/papers\/webstone."},{"volume-title":"Proceedings of the 2nd USENIX Windows NT Symposium.","author":"Vogels W.","key":"e_1_2_1_67_1","unstructured":"Vogels , W. , Dumitriu , D. , Agrawal , A. , Chia , T. , and Guo , K . 1998. Scalability of the Microsoft cluster service . In Proceedings of the 2nd USENIX Windows NT Symposium. Vogels, W., Dumitriu, D., Agrawal, A., Chia, T., and Guo, K. 1998. Scalability of the Microsoft cluster service. In Proceedings of the 2nd USENIX Windows NT Symposium."},{"volume-title":"Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing.","author":"Vogels W.","key":"e_1_2_1_68_1","unstructured":"Vogels , W. , Dumitriu , D. , Birman , K. , Gamache , R. , Massa , M. , Short , R. , Vert , J. , Barrera , J. , and Gray , J . 1998. The design and architecture of the Microsoft cluster service . In Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing. Vogels, W., Dumitriu, D., Birman, K., Gamache, R., Massa, M., Short, R., Vert, J., Barrera, J., and Gray, J. 1998. The design and architecture of the Microsoft cluster service. In Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing."},{"volume-title":"Proceedings of the 23rd Annual International Symposium on Fault-Tolerant Computing.","author":"Wang Y.-M.","key":"e_1_2_1_69_1","unstructured":"Wang , Y.-M. , Huang , Y. , and Fuchs , W. K . 1993. Progressive retry for software error recovery in distributed systems . In Proceedings of the 23rd Annual International Symposium on Fault-Tolerant Computing. Wang, Y.-M., Huang, Y., and Fuchs, W. K. 1993. Progressive retry for software error recovery in distributed systems. In Proceedings of the 23rd Annual International Symposium on Fault-Tolerant Computing."},{"volume-title":"Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing.","author":"Wang Y.-M.","key":"e_1_2_1_70_1","unstructured":"Wang , Y.-M. , Huang , Y. , Vo , K.-P. , Chung , P.-Y. , and Kintala , C . 1995a. Checkpointing and its applications . In Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing. Wang, Y.-M., Huang, Y., Vo, K.-P., Chung, P.-Y., and Kintala, C. 1995a. Checkpointing and its applications. In Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing."},{"volume-title":"Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing.","author":"Wang Y.-M.","key":"e_1_2_1_71_1","unstructured":"Wang , Y.-M. , Huang , Y. , Vo , K.-P. , Chung , P.-Y. , and Kintala , C. M. R. 1995b. Checkpointing and its applications . In Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing. Wang, Y.-M., Huang, Y., Vo, K.-P., Chung, P.-Y., and Kintala, C. M. R. 1995b. Checkpointing and its applications. In Proceedings of the 25th Annual International Symposium on Fault-Tolerant Computing."},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/305138.305215"}],"container-title":["ACM Transactions on Computer Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1275517.1275519","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1275517.1275519","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T20:00:30Z","timestamp":1750276830000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1275517.1275519"}},"subtitle":["Treating bugs as allergies\u2014a safe method to survive software failures"],"short-title":[],"issued":{"date-parts":[[2007,8]]},"references-count":72,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2007,8]]}},"alternative-id":["10.1145\/1275517.1275519"],"URL":"https:\/\/doi.org\/10.1145\/1275517.1275519","relation":{},"ISSN":["0734-2071","1557-7333"],"issn-type":[{"type":"print","value":"0734-2071"},{"type":"electronic","value":"1557-7333"}],"subject":[],"published":{"date-parts":[[2007,8]]},"assertion":[{"value":"2007-08-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}