{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,9,4]],"date-time":"2023-09-04T14:31:45Z","timestamp":1693837905182},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2008,12,13]],"date-time":"2008-12-13T00:00:00Z","timestamp":1229126400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2009,12]]},"DOI":"10.1007\/s11227-008-0259-0","type":"journal-article","created":{"date-parts":[[2008,12,12]],"date-time":"2008-12-12T10:20:06Z","timestamp":1229077206000},"page":"209-239","source":"Crossref","is-referenced-by-count":12,"title":["A fault-tolerant strategy for virtualized HPC clusters"],"prefix":"10.1007","volume":"50","author":[{"given":"John Paul","family":"Walters","sequence":"first","affiliation":[]},{"given":"Vipin","family":"Chaudhary","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,12,13]]},"reference":[{"key":"259_CR1","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1145\/1168857.1168860","volume-title":"ASPLOS-XII: proceedings of the 12th international conference on architectural support for programming languages and operating systems","author":"K Adams","year":"2006","unstructured":"Adams K, Agesen O (2006) A comparison of software and hardware techniques for x86 virtualization. In: ASPLOS-XII: proceedings of the 12th international conference on architectural support for programming languages and operating systems, 2006. ACM Press, New York, pp 2\u201313"},{"key":"259_CR2","first-page":"65","volume-title":"WWC \u201903: proceedings of the 6th international workshop on workload characterization","author":"I Ahmad","year":"2003","unstructured":"Ahmad I, Anderson JM, Holler AM, Kambo R, Makhija V (2003) An analysis of disk performance in VMware ESX server virtual machines. In: WWC \u201903: proceedings of the 6th international workshop on workload characterization, 2003. IEEE Computer Society Press, Los Alamitos, pp 65\u201376"},{"issue":"3","key":"259_CR3","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1109\/2.825694","volume":"33","author":"ER Altman","year":"2000","unstructured":"Altman ER, Kaeli D, Sheffer Y (2000) Guest editors\u2019 introduction: welcome to the opportunities of binary translation. Computer 33(3):40\u201345","journal-title":"Computer"},{"issue":"3","key":"259_CR4","first-page":"63","volume":"5","author":"DH Bailey","year":"1991","unstructured":"Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatoohi RA, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The NAS parallel benchmarks. Int J High Perform Comput Appl 5(3):63\u201373","journal-title":"Int J High Perform Comput Appl"},{"key":"259_CR5","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1145\/945445.945462","volume-title":"SOSP \u201903: proceedings of the 19th symposium on operating systems principles","author":"P Barham","year":"2003","unstructured":"Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. In: SOSP \u201903: proceedings of the 19th symposium on operating systems principles, 2003. ACM Press, New York, pp 164\u2013177"},{"issue":"3","key":"259_CR6","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1109\/TPDS.2008.14","volume":"19","author":"A Batsakis","year":"2008","unstructured":"Batsakis A, Burns R (2008) NFS-CD: write-enabled cooperative caching in NFS. IEEE Trans Parallel Distrib Syst 19(3):323\u2013333","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"2","key":"259_CR7","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1006\/jpdc.1997.1338","volume":"43","author":"A Beguelin","year":"1997","unstructured":"Beguelin A, Seligman E, Stephan P (1997) Application level fault tolerance in heterogeneous networks of workstations. J Parallel Distrib Comput 43(2):147\u2013155","journal-title":"J Parallel Distrib Comput"},{"key":"259_CR8","first-page":"1","volume-title":"SC \u201902: proceedings of the 19th annual supercomputing conference","author":"G Bosilca","year":"2002","unstructured":"Bosilca G, Bouteiller A, Cappello F, Djilali S, Fedak G, Germain C, Herault T, Lemarinier P, Lodygensky O, Magniette F, Neri V, Selikhov A (2002) MPICH-V: toward a scalable fault tolerant MPI for volatile nodes. In: SC \u201902: proceedings of the 19th annual supercomputing conference, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press, Los Alamitos, pp 1\u201318"},{"key":"259_CR9","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/781498.781513","volume-title":"PPoPP \u201903: proceedings of the 9th symposium on principles and practice of parallel programming","author":"G Bronevetsky","year":"2003","unstructured":"Bronevetsky G, Marques D, Pingali K, Stodghill P (2003) Automated application-level checkpointing of MPI programs. In: PPoPP \u201903: proceedings of the 9th symposium on principles and practice of parallel programming, 2003. ACM Press, New York, pp 84\u201394"},{"key":"259_CR10","first-page":"379","volume-title":"Proceedings of supercomputing symposium","author":"G Burns","year":"1994","unstructured":"Burns G, Daoud R, Vaigl J (1994) LAM: an open cluster environment for MPI. In: Proceedings of supercomputing symposium, 1994. IEEE Computer Society Press, Los Alamitos, pp 379\u2013386"},{"key":"259_CR11","unstructured":"Cherkasova L, Gardner R (2005) Measuring CPU overhead for I\/O processing in the Xen virtual machine monitor. In: USENIX 2005 annual technical conference, general track. USENIX Association, pp 387\u2013390"},{"key":"259_CR12","unstructured":"Clark B, Deshane T, Dow E, Evanchik S, Finlayson M, Herne J, Matthews J (2004) Xen and the art of repeated research. In: USENIX technical conference FREENIX track, 2004. USENIX Association, pp 135\u2013144"},{"key":"259_CR13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/cpe.728","volume":"15","author":"JJ Dongarra","year":"2003","unstructured":"Dongarra JJ, Luszczek P, Petitet A (2003) The LINPACK benchmark: Past, present, and future. Concurr Comput Pract Exp 15:1\u201318","journal-title":"Concurr Comput Pract Exp"},{"key":"259_CR14","unstructured":"Duell J (2002) The design and implementation of Berkeley Lab\u2019s Linux checkpoint\/restart. Technical Report LBNL-54941, Lawrence Berkeley National Lab"},{"issue":"3","key":"259_CR15","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1145\/568522.568525","volume":"34","author":"EN Elnozahy","year":"2002","unstructured":"Elnozahy EN, Alvisi L, Wang Y-M, Johnson DB (2002) A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv 34(3):375\u2013408","journal-title":"ACM Comput Surv"},{"key":"259_CR16","first-page":"1","volume-title":"CLUSTER \u201906: proceedings of the international conference on cluster computing","author":"W Emeneker","year":"2006","unstructured":"Emeneker W, Stanzione D (2006) HPC cluster readiness of Xen and user mode Linux. In: CLUSTER \u201906: proceedings of the international conference on cluster computing, 2006. IEEE Computer Society Press, Los Alamitos, pp 1\u20138"},{"issue":"6","key":"259_CR17","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1109\/MC.1974.6323581","volume":"7","author":"RP Goldberg","year":"1974","unstructured":"Goldberg RP (1974) Survey of virtual machine research. IEEE Comput 7(6):34\u201345","journal-title":"IEEE Comput"},{"issue":"4","key":"259_CR18","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1023\/A:1024504726988","volume":"31","author":"RL Graham","year":"2003","unstructured":"Graham RL, Choi SE, Daniel DJ, Desai NN, Minnich RG, Rasmussen CE, Risinger LD, Sukalski MW (2003) A network-failure-tolerant message-passing system for terascale clusters. Int J Parallel Program 31(4):285\u2013303","journal-title":"Int J Parallel Program"},{"issue":"3","key":"259_CR19","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1177\/1094342004046045","volume":"18","author":"WD Gropp","year":"2004","unstructured":"Gropp WD, Lusk E (2004) Fault tolerance in MPI programs. Int J High Perform Comput Appl 18(3):363\u2013372","journal-title":"Int J High Perform Comput Appl"},{"key":"259_CR20","unstructured":"Hewlett-Packard. Netperf. http:\/\/www.netperf.org"},{"key":"259_CR21","unstructured":"Litzkow M, Tannenbaum T, Basney J, Livny M (1997) Checkpoint and migration of Unix processes in the Condor distributed processing system. Technical Report 1346, University of Wisconsin-Madison"},{"key":"259_CR22","unstructured":"Liu J, Huang W, Abali B, Panda DK (2006) High performance VMM-bypass I\/O in virtual machines. In: Proceedings of the USENIX annual technical conference, 2006. USENIX Association, pp 3\u201316"},{"key":"259_CR23","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1145\/1064979.1064984","volume-title":"VEE \u201905: proceedings of the 1st ACM\/USENIX international conference on virtual execution environments","author":"A Menon","year":"2005","unstructured":"Menon A, Santos JR, Turner Y, Janakiraman G, Zwaenepoel W (2005) Diagnosing performance overheads in the Xen virtual machine environment. In: VEE \u201905: proceedings of the 1st ACM\/USENIX international conference on virtual execution environments, 2005. ACM Press, New York, pp 13\u201323"},{"key":"259_CR24","doi-asserted-by":"crossref","first-page":"878","DOI":"10.1145\/169627.169855","volume-title":"SC \u201993: proceedings of the 6th annual supercomputing conference","author":"The MPI Forum","year":"1993","unstructured":"The MPI Forum (1993) MPI: A message passing interface. In: SC \u201993: proceedings of the 6th annual supercomputing conference, 1993. IEEE Computer Society Press, Los Alamitos, pp 878\u2013883"},{"key":"259_CR25","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1145\/1274971.1274978","volume-title":"ICS \u201907: proceedings of the 21st annual international conference on supercomputing","author":"AB Nagarajan","year":"2007","unstructured":"Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: ICS \u201907: proceedings of the 21st annual international conference on supercomputing, 2007. ACM Press, New York, pp 23\u201332"},{"key":"259_CR26","unstructured":"Norcott WD, Capps D (2008) The IOZone filesystem benchmark. http:\/\/www.iozone.org"},{"key":"259_CR27","unstructured":"Plank JS, Beck M, Kingsley G, Li K (1994) Libckpt: transparent checkpointing under Unix. Technical Report UT-CS-94-242"},{"key":"259_CR28","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1145\/1272366.1272390","volume-title":"HPDC \u201907: proceedings of the international symposium on high performance distributed computing","author":"H Raj","year":"2007","unstructured":"Raj H, Schwan K (2007) High performance and scalable I\/O virtualization via self-virtualized devices. In: HPDC \u201907: proceedings of the international symposium on high performance distributed computing, 2007. IEEE Computer Society Press, Los Alamitos, pp 179\u2013188"},{"key":"259_CR29","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1109\/CLUSTR.2003.1253327","volume-title":"CLUSTER \u201903: the international conference on cluster computing","author":"F Sacerdoti","year":"2003","unstructured":"Sacerdoti F, Katz MJ, Massie ML, Culler DE (2003) Wide area cluster monitoring with Ganglia. In: CLUSTER \u201903: the international conference on cluster computing, 2003. IEEE Computer Society Press, Los Alamitos, pp 289\u2013298"},{"issue":"4","key":"259_CR30","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1177\/1094342005056139","volume":"19","author":"S Sankaran","year":"2005","unstructured":"Sankaran S, Squyres JM, Barrett B, Lumsdaine A, Duell J, Hargrove P, Roman E (2005) The LAM\/MPI checkpoint\/restart framework: system-initiated checkpointing. Int J High Perform Comput Appl 19(4):479\u2013493","journal-title":"Int J High Perform Comput Appl"},{"issue":"5","key":"259_CR31","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1109\/MC.2005.173","volume":"38","author":"JE Smith","year":"2005","unstructured":"Smith JE, Nair R (2005) The architecture of virtual machines. Computer 38(5):32\u201338","journal-title":"Computer"},{"issue":"3","key":"259_CR32","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1145\/1272998.1273025","volume":"41","author":"S Soltesz","year":"2007","unstructured":"Soltesz S, P\u00f6tzl H, Fiuczynski ME, Bavier A, Peterson L (2007) Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. SIGOPS Oper Syst Rev 41(3):275\u2013287","journal-title":"SIGOPS Oper Syst Rev"},{"issue":"5\/6","key":"259_CR33","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1147\/rd.435.0863","volume":"43","author":"L Spainhower","year":"1999","unstructured":"Spainhower L, Gregg TA (1999) IBM S\/390 parallel enterprise server G5 fault tolerance: a historical perspective. IBM J Res Devel 43(5\/6):863\u2013873","journal-title":"IBM J Res Devel"},{"key":"259_CR34","series-title":"LNCS","first-page":"379","volume-title":"Proceedings of the 10th European PVM\/MPI users\u2019 group meeting","author":"JM Squyres","year":"2003","unstructured":"Squyres JM, Lumsdaine A (2003) A component architecture for LAM\/MPI. In: Proceedings of the 10th European PVM\/MPI users\u2019 group meeting, 2003. LNCS, vol 2840. Springer, Berlin, pp 379\u2013387"},{"key":"259_CR35","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1145\/1134760.1220166","volume-title":"VEE \u201906: proceedings of the 2nd international conference on virtual execution environments","author":"S Sridhar","year":"2006","unstructured":"Sridhar S, Shapiro JS, Northup E, Bungale PP (2006) HDTrans: An open source, low-level dynamic instrumentation system. In: VEE \u201906: proceedings of the 2nd international conference on virtual execution environments, 2006. ACM Press, New York, pp 175\u2013185"},{"key":"259_CR36","unstructured":"SWSoft (2006) OpenVZ\u2014server virtualization. http:\/\/www.openvz.org\/"},{"key":"259_CR37","unstructured":"VMWare (2006) VMWare. http:\/\/www.vmware.com"},{"issue":"SI","key":"259_CR38","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1145\/844128.844146","volume":"36","author":"CA Waldspurger","year":"2002","unstructured":"Waldspurger CA (2002) Memory resource management in VMware ESX server. SIGOPS Oper Syst Rev 36(SI):181\u2013194","journal-title":"SIGOPS Oper Syst Rev"},{"key":"259_CR39","series-title":"LNCS","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/978-3-540-77220-0_26","volume-title":"HiPC \u201907: the international conference on high performance computing, 2007","author":"JP Walters","year":"2007","unstructured":"Walters JP, Chaudhary V (2007) A scalable asynchronous replication-based strategy for fault tolerant MPI applications. In: HiPC \u201907: the international conference on high performance computing, 2007. LNCS, vol 4873. Springer, Berlin, pp 257\u2013268"},{"key":"259_CR40","unstructured":"Walters JP, Chaudhary V (2008) Replication-based fault-tolerance for MPI applications. IEEE Trans Parallel Distrib Syst. IEEE computer society digital library. IEEE Computer Society, 5 December 2008. http:\/\/doi.ieeecomputersociety.org\/10.1109\/TPDS.2008.172"},{"issue":"4","key":"259_CR41","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1145\/1327512.1327513","volume":"11","author":"A Weiss","year":"2007","unstructured":"Weiss A (2007) Computing in the clouds. netWorker 11(4):16\u201325","journal-title":"netWorker"},{"key":"259_CR42","first-page":"41","volume-title":"ICS \u201999: proceedings of the 13th international conference on supercomputing","author":"FC Wong","year":"1999","unstructured":"Wong FC, Martin RP, Arpaci-Dusseau RH, Culler DE (1999) Architectural requirements and scalability of the NAS parallel benchmarks. In: ICS \u201999: proceedings of the 13th international conference on supercomputing, 1999. ACM Press, New York, pp 41\u201358"},{"key":"259_CR43","unstructured":"Zandy V (2000) Ckpt: User-level checkpointing. http:\/\/www.cs.wisc.edu\/~zandy\/ckpt\/"},{"issue":"3","key":"259_CR44","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1145\/1075395.1075402","volume":"39","author":"Y Zhang","year":"2005","unstructured":"Zhang Y, Wong D, Zheng W (2005) User-level checkpoint and recovery for LAM\/MPI. SIGOPS Oper Syst Rev 39(3):72\u201381","journal-title":"SIGOPS Oper Syst Rev"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-008-0259-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s11227-008-0259-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-008-0259-0","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,6,1]],"date-time":"2019-06-01T06:23:58Z","timestamp":1559370238000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s11227-008-0259-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,12,13]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2009,12]]}},"alternative-id":["259"],"URL":"https:\/\/doi.org\/10.1007\/s11227-008-0259-0","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,12,13]]}}}