{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:41:51Z","timestamp":1750308111136,"version":"3.41.0"},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","license":[{"start":{"date-parts":[[2007,2,9]],"date-time":"2007-02-09T00:00:00Z","timestamp":1170979200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ACM J. Exp. Algorithmics"],"published-print":{"date-parts":[[2007,2,9]]},"abstract":"<jats:p>Distributed shared memory (DSM) creates an abstraction of a physical shared memory that parallel programmers can access. Most recent software DSM systems provide relaxed-memory models that guarantee consistency only at synchronization operations, such as locks and barriers. As the main goal of DSM systems is to provide support for long-term computation-intensive applications, checkpointing and recovery mechanisms are highly desirable. This article presents and evaluates the integration of a coordinated checkpointing mechanism to the barrier primitive that is usually provided with many DSM systems. Our results on some popular benchmarks and a real parallel application show that the overhead introduced during the failure-free execution is often small.<\/jats:p>","DOI":"10.1145\/1187436.1216586","type":"journal-article","created":{"date-parts":[[2010,4,7]],"date-time":"2010-04-07T02:56:32Z","timestamp":1270608992000},"source":"Crossref","is-referenced-by-count":1,"title":["Integrating coordinated checkpointing and recovery mechanisms into DSM synchronization barriers"],"prefix":"10.1145","volume":"11","author":[{"given":"Azzedine","family":"Boukerche","sequence":"first","affiliation":[{"name":"University of Ottawa, Canada"}]},{"given":"Alba Cristina Magalhaes Alves","family":"De Melo","sequence":"additional","affiliation":[{"name":"University of Brasilia, Brasilia, Brazil"}]}],"member":"320","published-online":{"date-parts":[[2007,2,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.546611"},{"volume-title":"Proceedings of the 3rd IEEE Symposium on High Performance Computer Architecture, (Feb.) San Antonio, TX. IEEE Computer Society, Washington, D.C. 261--271","author":"Amza C.","key":"e_1_2_1_2_1","unstructured":"Amza , C. , Cox , A. , Dwarkakas , S. , and Zwaenenpoel , W . 1997. Software DSM protocols that adapt between single writer and multiple writer . In Proceedings of the 3rd IEEE Symposium on High Performance Computer Architecture, (Feb.) San Antonio, TX. IEEE Computer Society, Washington, D.C. 261--271 .]] Amza, C., Cox, A., Dwarkakas, S., and Zwaenenpoel, W. 1997. Software DSM protocols that adapt between single writer and multiple writer. In Proceedings of the 3rd IEEE Symposium on High Performance Computer Architecture, (Feb.) San Antonio, TX. IEEE Computer Society, Washington, D.C. 261--271.]]"},{"volume-title":"Proceedings of the IEEE International Symposium on Cluster Computing and the Grid, (Apr.)","author":"Badrinath R.","key":"e_1_2_1_3_1","unstructured":"Badrinath , R. and Morin , C . 2004. Software DSM protocols that adapt between single writer and multiple writer . In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid, (Apr.) . Chicago, IL, IEEE Computer Society, Washington, D.C. 459--466.]] Badrinath, R. and Morin, C. 2004. Software DSM protocols that adapt between single writer and multiple writer. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid, (Apr.). Chicago, IL, IEEE Computer Society, Washington, D.C. 459--466.]]"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1177\/109434209100500306","article-title":"The NAS parallel benchmarks","volume":"5","author":"Bailey D.","year":"1991","unstructured":"Bailey , D. , 1991 . The NAS parallel benchmarks . The International Journal of Supercomputing and Applications 5 , 3, 63 -- 73 .]] Bailey, D., et al. 1991. The NAS parallel benchmarks. The International Journal of Supercomputing and Applications 5, 3, 63--73.]]","journal-title":"The International Journal of Supercomputing and Applications"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/99163.99182"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation, 59--73","author":"Costa M.","year":"1996","unstructured":"Costa , M. , Guedes , P. , Sequeira , M. , Neves , N. , Castro , N. 1996 . Lightweight logging for lazy Release consistency distributed shared memory . In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation, 59--73 .]] 10.1145\/238721.238762 Costa, M., Guedes, P., Sequeira, M., Neves, N., Castro, N. 1996. Lightweight logging for lazy Release consistency distributed shared memory. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation, 59--73.]] 10.1145\/238721.238762"},{"key":"e_1_2_1_7_1","volume-title":"Tech. Rep. TR CMU-CS-96--181","author":"Elnozhay M.","year":"1996","unstructured":"Elnozhay , M. , Alvisi , L. , and Wang , L . 1996 . A survey of rollback\/recovery protocols in message-passing systems, Tech. Rep. TR CMU-CS-96--181 , Carnegie-Mellon University, Pittsburgh, PA .]] Elnozhay, M., Alvisi, L., and Wang, L. 1996. A survey of rollback\/recovery protocols in message-passing systems, Tech. Rep. TR CMU-CS-96--181, Carnegie-Mellon University, Pittsburgh, PA.]]"},{"volume-title":"Proceedings of the 17th Annual International Symposium on Computer Architecture (May)","author":"Gharachorloo K.","key":"e_1_2_1_8_1","unstructured":"Gharachorloo , K. , Lenoski , D. , Laudon , J. , Gibbons , P. , Gupta , A. , and Hennessy , J . 1991. Memory consistency and event ordering in scalable shared-memory multiprocessors . In Proceedings of the 17th Annual International Symposium on Computer Architecture (May) . Seattle, Washington, 15--26.]] 10.1145\/325164.325102 Gharachorloo, K., Lenoski, D., Laudon, J., Gibbons, P., Gupta, A., and Hennessy, J. 1991. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture (May). Seattle, Washington, 15--26.]] 10.1145\/325164.325102"},{"key":"e_1_2_1_9_1","series-title":"Lecture Notes on Computer Science 1593","volume-title":"JIAJIA: An SVM system based on a new cache coherence protocol. In Proceedings of the High Performance Computing and Networking","author":"Hu W.","year":"1999","unstructured":"Hu , W. , Shi , W. , and Tang , Z . 1999 . JIAJIA: An SVM system based on a new cache coherence protocol. In Proceedings of the High Performance Computing and Networking . Lecture Notes on Computer Science 1593 , Amsterdam, Netherlands , ( Apr .). 463--472.]] Hu, W., Shi, W., and Tang, Z. 1999. JIAJIA: An SVM system based on a new cache coherence protocol. In Proceedings of the High Performance Computing and Networking. Lecture Notes on Computer Science 1593, Amsterdam, Netherlands, (Apr.). 463--472.]]"},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1007\/s002240000097","article-title":"Scope consistency: Bridging the gap between release consistency and entry consistency","volume":"31","author":"Iftode L.","year":"1998","unstructured":"Iftode , L. , Singh , J. P. , and Li , K. 1998 . Scope consistency: Bridging the gap between release consistency and entry consistency . Theory of Comptuing Systems 31 , 4, 451 -- 473 .]] Iftode, L., Singh, J. P., and Li, K. 1998. Scope consistency: Bridging the gap between release consistency and entry consistency. Theory of Comptuing Systems 31, 4, 451--473.]]","journal-title":"Theory of Comptuing Systems"},{"volume-title":"Proceedings of the 13th Symposium on Reliable Distributed Systems, Dana Point (Oct.), IEEE Computer Society, Washington, D.C. 42--51","author":"Janakiraman G.","key":"e_1_2_1_12_1","unstructured":"Janakiraman , G. and Tamir , Y . 1994. Coordinated checkpointing-rollback error recovery for DSM multicomputers . In Proceedings of the 13th Symposium on Reliable Distributed Systems, Dana Point (Oct.), IEEE Computer Society, Washington, D.C. 42--51 .]] Janakiraman, G. and Tamir, Y. 1994. Coordinated checkpointing-rollback error recovery for DSM multicomputers. In Proceedings of the 13th Symposium on Reliable Distributed Systems, Dana Point (Oct.), IEEE Computer Society, Washington, D.C. 42--51.]]"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2004.83"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the USENIX","author":"Keleher P.","year":"1994","unstructured":"Keleher , P. , Cox , A. L. , Dwarkadas , S. , and Zwaenepoel , W . 1994. TreadMarks: Distributed shared memory on standard workstations and operating systems . In Proceedings of the USENIX Winter 1994 Technical Conference (Jan.). San Francisco, CA. 115--131.]] Keleher, P., Cox, A. L., Dwarkadas, S., and Zwaenepoel, W. 1994. TreadMarks: Distributed shared memory on standard workstations and operating systems. In Proceedings of the USENIX Winter 1994 Technical Conference (Jan.). San Francisco, CA. 115--131.]]"},{"volume-title":"Proceedings of the 18th Symposium on Reliable Distributed Systems","author":"Kongmunvattana A.","key":"e_1_2_1_15_1","unstructured":"Kongmunvattana , A. and Tzeng , N . 1999. Logging and recovery in adaptative software distributed shared memory . In Proceedings of the 18th Symposium on Reliable Distributed Systems , Lausanne, Switzerland, (Oct.). IEEE Computer Society, Washington, D.C. 202--211.]] Kongmunvattana, A. and Tzeng, N. 1999. Logging and recovery in adaptative software distributed shared memory. In Proceedings of the 18th Symposium on Reliable Distributed Systems, Lausanne, Switzerland, (Oct.). IEEE Computer Society, Washington, D.C. 202--211.]]"},{"volume-title":"Proceedings of the 20th International Conference on Distributed Computing Systems (April). 556--563","author":"Kongmunvattana A.","key":"e_1_2_1_16_1","unstructured":"Kongmunvattana , A. , Tanchatchawal , S. , and Tzeng , N . 2000. Coherence-based coordinated checkpointing for software distributed shared memory . In Proceedings of the 20th International Conference on Distributed Computing Systems (April). 556--563 .]] Kongmunvattana, A., Tanchatchawal, S., and Tzeng, N. 2000. Coherence-based coordinated checkpointing for software distributed shared memory. In Proceedings of the 20th International Conference on Distributed Computing Systems (April). 556--563.]]"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.205652"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1997.1332"},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Melo R. F. Walter M. E. M. T. Melo A. C. M. A. Batista R. B. Santana M. N. Martins T. and \n      Fonseca T\n  . \n  2003\n  . Comparing two long DNA sequences using a DSM system. In Proceedings of the 9th International Euro-Par Conference Lecture Notes in Computer Science 2790 Klagenfurt Austria (Aug.). \n  Springer-Verlag New York. 517--524.]]  Melo R. F. Walter M. E. M. T. Melo A. C. M. A. Batista R. B. Santana M. N. Martins T. and Fonseca T. 2003. Comparing two long DNA sequences using a DSM system. In Proceedings of the 9th International Euro-Par Conference Lecture Notes in Computer Science 2790 Klagenfurt Austria (Aug.). Springer-Verlag New York. 517--524.]]","DOI":"10.1007\/978-3-540-45209-6_74"},{"volume-title":"Proceedings of the 4th International Symposium on High Performance Computing Architecture, Las Vegas, NV (Feb.). IEEE Computer Society, Washington, D.C. 289--299","author":"Monnerat L.","key":"e_1_2_1_20_1","unstructured":"Monnerat , L. and Bianchini , R . 1998. Eficiently adapting to sharing patterns in software DSMs . In Proceedings of the 4th International Symposium on High Performance Computing Architecture, Las Vegas, NV (Feb.). IEEE Computer Society, Washington, D.C. 289--299 .]] Monnerat, L. and Bianchini, R. 1998. Eficiently adapting to sharing patterns in software DSMs. In Proceedings of the 4th International Symposium on High Performance Computing Architecture, Las Vegas, NV (Feb.). IEEE Computer Society, Washington, D.C. 289--299.]]"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/160551.160553"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the USENIX","author":"Plank J. S.","year":"1995","unstructured":"Plank , J. S. , Beck , M. , Kingsley , G. , and Li , K . 1995. Libckpt: Transparent checkpointing under linux . In Proceedings of the USENIX Winter 1995 Technical Conference (Jan.). New Orleans, LA. 213--224.]] Plank, J. S., Beck, M., Kingsley, G., and Li, K. 1995. Libckpt: Transparent checkpointing under linux. In Proceedings of the USENIX Winter 1995 Technical Conference (Jan.). New Orleans, LA. 213--224.]]"},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith T. F.","year":"1981","unstructured":"Smith , T. F. and Waterman , M. S. 1981 . Identification of common molecular subsequences . Journal of Molecular Biology 147 , 1, 195 -- 197 .]] Smith, T. F. and Waterman, M. S. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147, 1, 195--197.]]","journal-title":"Journal of Molecular Biology"},{"key":"e_1_2_1_25_1","volume-title":"Tech. Rep. ECE-TR-98-03","author":"Speight E.","year":"1998","unstructured":"Speight , E. and Bennet , J . 1998 . Reducing coherence-related communication in software distributed shared memory systems. Tech. Rep. ECE-TR-98-03 , Rice University, Houston , TX .]] Speight, E. and Bennet, J. 1998. Reducing coherence-related communication in software distributed shared memory systems. Tech. Rep. ECE-TR-98-03, Rice University, Houston, TX.]]"},{"volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing (Nov.)","author":"Sultan F.","key":"e_1_2_1_26_1","unstructured":"Sultan , F. , Nguyen , T. and Iftode , L . 2000. Scalable fault tolerant distributed shared memory . In Proceedings of the ACM\/IEEE Conference on Supercomputing (Nov.) . Dallas, TX. IEEE Computer Society, Washington, D.C.]] Sultan, F., Nguyen, T. and Iftode, L. 2000. Scalable fault tolerant distributed shared memory. In Proceedings of the ACM\/IEEE Conference on Supercomputing (Nov.). Dallas, TX. IEEE Computer Society, Washington, D.C.]]"},{"key":"e_1_2_1_27_1","volume-title":"Tech. Rep. CHRC-95--16","author":"Wang Y.","year":"1995","unstructured":"Wang , Y. , Chung , P. , and Fuchs , W . 1995 . Tight upper bounds on useful distributed systems checkpoints. Tech. Rep. CHRC-95--16 , University of Urbana-Champaign , Urbana, IL .]] Wang, Y., Chung, P., and Fuchs, W. 1995. Tight upper bounds on useful distributed systems checkpoints. Tech. Rep. CHRC-95--16, University of Urbana-Champaign, Urbana, IL.]]"},{"key":"e_1_2_1_28_1","volume-title":"Ckpt: A Checkpoint Library Under Unix","author":"Zandy V.","year":"2003","unstructured":"Zandy , V. 2003 . Ckpt: A Checkpoint Library Under Unix . http:\/\/www.cs.wisc.edu\/~zandy\/ckpt.]] Zandy, V. 2003. Ckpt: A Checkpoint Library Under Unix. http:\/\/www.cs.wisc.edu\/~zandy\/ckpt.]]"}],"container-title":["ACM Journal of Experimental Algorithmics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1187436.1216586","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1187436.1216586","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:08:11Z","timestamp":1750262891000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1187436.1216586"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,2,9]]},"references-count":26,"alternative-id":["10.1145\/1187436.1216586"],"URL":"https:\/\/doi.org\/10.1145\/1187436.1216586","relation":{},"ISSN":["1084-6654","1084-6654"],"issn-type":[{"type":"print","value":"1084-6654"},{"type":"electronic","value":"1084-6654"}],"subject":[],"published":{"date-parts":[[2007,2,9]]}}}