{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T05:46:03Z","timestamp":1740807963616,"version":"3.38.0"},"reference-count":35,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2009,7,20]],"date-time":"2009-07-20T00:00:00Z","timestamp":1248048000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p> Future missions of deep-space exploration face the challenge of building more capable autonomous spacecraft and planetary rovers. Given the communication latencies and bandwidth limitations for such missions, the need for increased autonomy becomes mandatory, along with the requirement for enhanced on-board computational capabilities while in deep-space or time-critical situations. This will result in dramatic changes in the way missions are conducted and supported by on-board computing systems. Specifically, the traditional approach of relying exclusively on radiation-hardened hardware and modular redundancy will not be able to deliver the required computational power. As a consequence, such systems are expected to include high-capability low-power components based on emerging commercial-off-the-shelf (COTS) multi-core technology. In this paper we describe the design of a generic framework for introspection that supports runtime monitoring and analysis of program execution as well as a feedback-oriented recovery from faults. Our focus is on providing flexible software fault tolerance matched to the requirements and properties of applications by exploiting knowledge that is either contained in an application knowledge base, provided by users, or automatically derived from specifications. A prototype implementation is currently in progress at the Jet Propulsion Laboratory, California Institute of Technology, targeting a cluster of cell broadband engines. <\/jats:p>","DOI":"10.1177\/1094342009106190","type":"journal-article","created":{"date-parts":[[2009,7,20]],"date-time":"2009-07-20T11:28:49Z","timestamp":1248089329000},"page":"227-241","source":"Crossref","is-referenced-by-count":5,"title":["Adaptive Fault Tolerance for Scalable Cluster Computing in Space"],"prefix":"10.1177","volume":"23","author":[{"given":"Mark L.","family":"James","sequence":"first","affiliation":[{"name":"JET PROPULSION LABORATORY, CALIFORNIA INSTITUTE OF TECHNOLOGY,\rPASADENA, CA 91109, USA,"}]},{"given":"Andrew A.","family":"Shapiro","sequence":"additional","affiliation":[{"name":"JET PROPULSION LABORATORY, CALIFORNIA INSTITUTE OF TECHNOLOGY,\rPASADENA, CA 91109, USA,"}]},{"given":"Paul L.","family":"Springer","sequence":"additional","affiliation":[{"name":"JET PROPULSION LABORATORY, CALIFORNIA INSTITUTE OF TECHNOLOGY,\rPASADENA, CA 91109, USA,"}]},{"given":"Hans P.","family":"Zima","sequence":"additional","affiliation":[{"name":"JET PROPULSION LABORATORY, CALIFORNIA INSTITUTE OF TECHNOLOGY,\rPASADENA, CA 91109, USA,"}]}],"member":"179","published-online":{"date-parts":[[2009,7,20]]},"reference":[{"doi-asserted-by":"publisher","key":"atypb1","DOI":"10.1109\/MC.2007.213"},{"volume-title":"The Design and Analysis of Computer Algorithms","year":"1974","author":"Aho, A.V.","key":"atypb2"},{"volume-title":"Proceedings of Modelling and Verification of Parallel Processes (MOVEP'2k), Lecture Notes in Computer Science 2067","author":"Amnell, T.","key":"atypb3"},{"doi-asserted-by":"publisher","key":"atypb4","DOI":"10.1002\/rob.20192"},{"volume-title":"Proceedings of the 8th International Symposium on Artificial Intelligence, Robotics, and Automation in Space (i-SAIRAS 2005), ESA SP-603","author":"Castano, R.","key":"atypb5"},{"issue":"2","key":"atypb6","first-page":"69","volume":"2","author":"Dechant, D.J.","year":"1990","journal-title":"Advances in VLSI Computer Systems"},{"doi-asserted-by":"publisher","key":"atypb7","DOI":"10.1109\/MC.1980.1653418"},{"unstructured":"Dijkstra, E.W. 1972. Notes on structured programming. In Dahl, O.J., Dijkstra, E. W. and Hoare, C. (eds), Structured Programming, Academic Press, London, pp. 1-82","key":"atypb8"},{"doi-asserted-by":"publisher","key":"atypb9","DOI":"10.1147\/sj.451.0059"},{"key":"atypb10","volume-title":"Verification and validation","author":"Gannon, J.D.","year":"2004","edition":"2"},{"doi-asserted-by":"publisher","key":"atypb11","DOI":"10.1109\/AERO.2005.1559341"},{"volume-title":"The Design and Implementation of a Fault-tolerant Cluster Manager. Technical Report CSD-010040","year":"2001","author":"Goldberg, D.","key":"atypb12"},{"volume-title":"Proceedings Verified Software: Theories, Tools, Experiments (VSTTE'05)","author":"Havelund, K.","key":"atypb13"},{"volume-title":"The SPIN Model Checker. Primer and Reference Manual","year":"2003","author":"Holzmann, G.J.","key":"atypb14"},{"key":"atypb15","volume-title":"Introduction to Automata Theory, Languages, and Computation","author":"Hopcroft, J.E.","year":"2006","edition":"3"},{"volume-title":"Proceedings of the Fault-Tolerant Computing Symposium (FTCS-21)","author":"Iacoponi, M.J.","key":"atypb16"},{"doi-asserted-by":"publisher","key":"atypb17","DOI":"10.1109\/MSP.2007.23"},{"volume-title":"Proceedings 2008 IEEE Aerospace Conference","author":"James, M.L.","key":"atypb18"},{"doi-asserted-by":"publisher","key":"atypb19","DOI":"10.1147\/rd.515.0503"},{"doi-asserted-by":"publisher","key":"atypb20","DOI":"10.1109\/71.774907"},{"doi-asserted-by":"publisher","key":"atypb21","DOI":"10.1145\/357172.357176"},{"volume-title":"Cluster'02: Proceedings of the IEEE International Conference on Cluster Computing","author":"Li, M.","key":"atypb22"},{"doi-asserted-by":"publisher","key":"atypb23","DOI":"10.1109\/5254.722359"},{"volume-title":"Program Model Checking. A Practitioner's Guide, Version 1.0","year":"2007","author":"Mansouri-Samani, M.","key":"atypb24"},{"volume-title":"Fault Tolerance for Multicomputers: The Application Oriented Paradigm","year":"1997","author":"McMillin, B.M.","key":"atypb25"},{"volume-title":"Proceedings of the 2005 29th Annual IEEE\/NASA Software Engineering Workshop (SEW'05)","author":"Mehlitz, P.C.","key":"atypb26"},{"doi-asserted-by":"publisher","key":"atypb27","DOI":"10.1007\/978-3-662-03811-6"},{"volume-title":"Proceedings 2006 IEEE Aerospace Conference","author":"Ramos, J.","key":"atypb28"},{"volume-title":"Proceedings 2008 IEEE Aerospace Conference","author":"Rice, E.B.","key":"atypb29"},{"volume-title":"Proceedings 2007 IEEE Aerospace Conference","author":"Samson, J.","key":"atypb30"},{"volume-title":"Fault-Tolerant Computing for Radiation Environments. Ph.D. Thesis (Technical Report 01-6), Center for Reliable Computing","year":"2001","author":"Shirvani, P.P.","key":"atypb31"},{"volume-title":"Proceedings of the Digital Avionics Systems System Conference","author":"Some, R.","key":"atypb32"},{"unstructured":"Wooldridge, M. 1999. Intelligent agents. In Weiss, G. (ed.), Multiagent Systems. A Modern Approach to Distributed Artificial Intelligence, SAGE Publications, Cambridge, MA, pp. 27-78.","key":"atypb33"},{"doi-asserted-by":"crossref","unstructured":"Zima, H.P. 2004. Introspection in a massively parallel PIMbased architecture . In Joubert, G. R. (ed.), Advances in Parallel Computing, Vol. 13, Elsevier, Amsterdam, pp. 441-448.","key":"atypb34","DOI":"10.1016\/S0927-5452(04)80057-1"},{"volume-title":"Supercompilers for Parallel and Vector Computers, Frontier","year":"1991","author":"Zima, H.P.","key":"atypb35"}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342009106190","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1094342009106190","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T17:52:32Z","timestamp":1740765152000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1094342009106190"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,7,20]]},"references-count":35,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.1177\/1094342009106190"],"URL":"https:\/\/doi.org\/10.1177\/1094342009106190","relation":{},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"type":"print","value":"1094-3420"},{"type":"electronic","value":"1741-2846"}],"subject":[],"published":{"date-parts":[[2009,7,20]]}}}