{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:06Z","timestamp":1772138046119,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"22","license":[{"start":{"date-parts":[[2021,5,26]],"date-time":"2021-05-26T00:00:00Z","timestamp":1621987200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100007316","name":"Klaus Tschira Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100007316","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010663","name":"European Research Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Union\u2019s Horizon 2020 research and innovation program","award":["882500"],"award-info":[{"award-number":["882500"]}]},{"name":"Ministry of Science, Research and the Arts of Baden-W\u00fcrttemberg","award":["33-7533.-9-10\/20\/2"],"award-info":[{"award-number":["33-7533.-9-10\/20\/2"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Phylogenetic trees are now routinely inferred on large scale high performance computing systems with thousands of cores as the parallel scalability of phylogenetic inference tools has improved over the past years to cope with the molecular data avalanche. Thus, the parallel fault tolerance of phylogenetic inference tools has become a relevant challenge. To this end, we explore parallel fault tolerance mechanisms and algorithms, the software modifications required and the performance penalties induced via enabling parallel fault tolerance by example of RAxML-NG, the successor of the widely used RAxML tool for maximum likelihood-based phylogenetic tree inference.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We find that the slowdown induced by the necessary additional recovery mechanisms in RAxML-NG is on average 1.00\u2009\u00b1\u20090.04. The overall slowdown by using these recovery mechanisms in conjunction with a fault-tolerant Message Passing Interface implementation amounts to on average 1.7\u2009\u00b1\u20090.6 for large empirical datasets. Via failure simulations, we show that RAxML-NG can successfully recover from multiple simultaneous failures, subsequent failures, failures during recovery and failures during checkpointing. Recoveries are automatic and transparent to the user.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The modified fault-tolerant RAxML-NG code is available under GNU GPL at https:\/\/github.com\/lukashuebner\/ft-raxml-ng.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab399","type":"journal-article","created":{"date-parts":[[2021,5,25]],"date-time":"2021-05-25T15:13:39Z","timestamp":1621955619000},"page":"4056-4063","source":"Crossref","is-referenced-by-count":7,"title":["Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9213-7597","authenticated-orcid":false,"given":"Lukas","family":"H\u00fcbner","sequence":"first","affiliation":[{"name":"Institute of Theoretical Informatics, Karlsruhe Institute of Technology , Baden, Karlsruhe, W\u00fcrttemberg, Germany"},{"name":"Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies , Baden, Heidelberg, W\u00fcrttemberg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7394-2718","authenticated-orcid":false,"given":"Alexey M","family":"Kozlov","sequence":"additional","affiliation":[{"name":"Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies , Baden, Heidelberg, W\u00fcrttemberg, Germany"}]},{"given":"Demian","family":"Hespe","sequence":"additional","affiliation":[{"name":"Institute of Theoretical Informatics, Karlsruhe Institute of Technology , Baden, Karlsruhe, W\u00fcrttemberg, Germany"}]},{"given":"Peter","family":"Sanders","sequence":"additional","affiliation":[{"name":"Institute of Theoretical Informatics, Karlsruhe Institute of Technology , Baden, Karlsruhe, W\u00fcrttemberg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0353-0691","authenticated-orcid":false,"given":"Alexandros","family":"Stamatakis","sequence":"additional","affiliation":[{"name":"Institute of Theoretical Informatics, Karlsruhe Institute of Technology , Baden, Karlsruhe, W\u00fcrttemberg, Germany"},{"name":"Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies , Baden, Heidelberg, W\u00fcrttemberg, Germany"}]}],"member":"286","published-online":{"date-parts":[[2021,5,26]]},"reference":[{"key":"2023051607111152200_btab399-B1","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1177\/1094342015628056","article-title":"Complex scientific applications made fault-tolerant with the sparse grid combination technique","volume":"30","author":"Ali","year":"2016","journal-title":"Int. J. High Perform. Comput. Appl"},{"key":"2023051607111152200_btab399-B2","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1080\/10635150590905984","article-title":"Missing the forest for the trees: phylogenetic compression and its implications for inferring complex evolutionary histories","volume":"54","author":"An\u00e9","year":"2005","journal-title":"Syst. Biol"},{"key":"2023051607111152200_btab399-B3","author":"Ashraf","year":"2018"},{"key":"2023051607111152200_btab399-B4","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1177\/1094342013488238","article-title":"Post-failure recovery of MPI communication capability","volume":"27","author":"Bland","year":"2013","journal-title":"Int. J. High Perform. Comput. Appl"},{"key":"2023051607111152200_btab399-B5","author":"Bosilca","year":"2020"},{"key":"2023051607111152200_btab399-B6","doi-asserted-by":"crossref","DOI":"10.1016\/j.jpdc.2008.12.002","article-title":"Algorithmic based fault tolerance applied to high performance computing","author":"Bosilca","year":"2009","journal-title":"J. Parallel Distributed Comput"},{"key":"2023051607111152200_btab399-B7","article-title":"Toward exascale resilience: 2014 update","volume":"1","author":"Cappello","year":"2014","journal-title":"Supercomput. Front. Innovations"},{"key":"2023051607111152200_btab399-B8","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-20943-2_1","volume-title":"Fault-Tolerance Techniques for High-Performance Computing","author":"Dongarra","year":"2015"},{"key":"2023051607111152200_btab399-B9","first-page":"47","volume-title":"Proceedings of the 1st International Workshop on Challenges of Large Applications in Distributed Environments, CLADE \u201903","author":"Engelmann","year":"2003"},{"key":"2023051607111152200_btab399-B10","article-title":"A survey of distributed fault tolerance strategies","volume":"2","author":"Gavaskar","year":"2013","journal-title":"Int. J. Adv. Res. Comput. Commun. Eng"},{"key":"2023051607111152200_btab399-B11","volume-title":"Proceedings of the 9th European PVM\/MPI Users\u2019 Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface","author":"Gropp","year":"2002"},{"key":"2023051607111152200_btab399-B12","author":"Gupta","year":"2017"},{"key":"2023051607111152200_btab399-B13","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1088\/1742-6596\/46\/1\/067","article-title":"Berkeley lab checkpoint\/restart (BLCR) for Linux clusters","volume":"46","author":"Hargrove","year":"2006","journal-title":"J. Phys. Conference Ser"},{"key":"2023051607111152200_btab399-B14","doi-asserted-by":"crossref","first-page":"1320","DOI":"10.1126\/science.1253451","article-title":"Whole-genome analyses resolve early branches in the tree of life of modern birds","volume":"346","author":"Jarvis","year":"2014","journal-title":"Science"},{"key":"2023051607111152200_btab399-B15","first-page":"204","volume-title":"Lecture Notes in Computer Science","author":"Kobert","year":"2014"},{"key":"2023051607111152200_btab399-B16","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1177\/1094342018767736","article-title":"A scalable and extensible checkpointing scheme for massively parallel simulations","volume":"33","author":"Kohl","year":"2017","journal-title":"Int. J. High Perform. Comput. Appl"},{"key":"2023051607111152200_btab399-B17","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1093\/bioinformatics\/btv184","article-title":"ExaML version 3 a tool for phylogenomic analyses on supercomputers","volume":"31","author":"Kozlov","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051607111152200_btab399-B18","doi-asserted-by":"crossref","first-page":"4453","DOI":"10.1093\/bioinformatics\/btz305","article-title":"RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference","volume":"35","author":"Kozlov","year":"2019","journal-title":"Bioinformatics"},{"key":"2023051607111152200_btab399-B19","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1177\/1094342015623623","article-title":"Evaluating and extending user-level fault tolerance in MPI applications","volume":"30","author":"Laguna","year":"2016","journal-title":"Int. J. High Perform. Comput. Appl"},{"key":"2023051607111152200_btab399-B20","article-title":"Failure data analysis of HPC systems","author":"Lu","year":"2013","journal-title":"Comput. Sci"},{"key":"2023051607111152200_btab399-B21","year":"2017"},{"key":"2023051607111152200_btab399-B22","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1126\/science.1257570","article-title":"Phylogenomics resolves the timing and pattern of insect evolution","volume":"346","author":"Misof","year":"2014","journal-title":"Science"},{"key":"2023051607111152200_btab399-B23","author":"Obersteiner","year":"2017"},{"key":"2023051607111152200_btab399-B24","doi-asserted-by":"crossref","first-page":"972","DOI":"10.1109\/71.730527","article-title":"Diskless checkpointing","volume":"9","author":"Plank","year":"1998","journal-title":"IEEE Trans. Parallel Distrib. Syst"},{"key":"2023051607111152200_btab399-B25","author":"Roman","year":"2002"},{"key":"2023051607111152200_btab399-B26","doi-asserted-by":"crossref","first-page":"C358","DOI":"10.1137\/17M1128411","article-title":"Extreme-scale block-structured adaptive mesh refinement","volume":"40","author":"Schornbaum","year":"2018","journal-title":"SIAM J. Sci. Comput. (SISC)"},{"key":"2023051607111152200_btab399-B27","first-page":"1","volume-title":"Lecture Notes in Computer Science","author":"Shalf","year":"2011"},{"key":"2023051607111152200_btab399-B28","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-020-20005-6","article-title":"An investigation of irreproducibility in maximum likelihood phylogenetic inference","volume":"11","author":"Shen","year":"2020","journal-title":"Nat. Commun"},{"key":"2023051607111152200_btab399-B29","doi-asserted-by":"crossref","first-page":"618","DOI":"10.1093\/bioinformatics\/btk020","article-title":"Andy: a general, fault-tolerant tool for database searching on computer clusters","volume":"22","author":"Smith","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051607111152200_btab399-B30","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1177\/1094342014522573","article-title":"Addressing failures in exascale computing","volume":"28","author":"Snir","year":"2014","journal-title":"Int. J. High Perform. Comput. Appl"},{"key":"2023051607111152200_btab399-B31","doi-asserted-by":"crossref","first-page":"1312","DOI":"10.1093\/bioinformatics\/btu033","article-title":"RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies","volume":"30","author":"Stamatakis","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051607111152200_btab399-B32","year":"2020"},{"key":"2023051607111152200_btab399-B33","author":"Teranishi","year":"2014"},{"key":"2023051607111152200_btab399-B34","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.compbiomed.2014.02.005","article-title":"Automating fault tolerance in high-performance computational biological jobs using multi-agent approaches","volume":"48","author":"Varghese","year":"2014","journal-title":"Comput. Biol. Med"},{"key":"2023051607111152200_btab399-B35","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1016\/S0141-9331(97)00029-X","article-title":"Algorithm-based fault tolerance: a review","volume":"21","author":"Vijay","year":"1997","journal-title":"Microprocessors Microsyst"},{"key":"2023051607111152200_btab399-B36","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1007\/BF00160154","article-title":"Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods","volume":"39","author":"Yang","year":"1994","journal-title":"J. Mol. Evol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab399\/39309844\/btab399.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/22\/4056\/50335328\/btab399.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/22\/4056\/50335328\/btab399.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T03:17:38Z","timestamp":1684207058000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/22\/4056\/6284957"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,5,26]]},"references-count":36,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2021,11,18]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab399","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.01.15.426773","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11,15]]},"published":{"date-parts":[[2021,5,26]]}}}