{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T20:48:36Z","timestamp":1770842916473,"version":"3.50.1"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T00:00:00Z","timestamp":1719532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Tel Aviv University Center for AI and Data Science"},{"name":"Edmond J. Safra Center for Bioinformatics at Tel Aviv University"},{"DOI":"10.13039\/100010663","name":"European Research Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Union\u2019s Horizon 2020"},{"name":"Research and Innovation Program","award":["882396"],"award-info":[{"award-number":["882396"]}]},{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","award":["993\/17"],"award-info":[{"award-number":["993\/17"]}],"id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Yandex Initiative for Machine Learning at Tel Aviv University"},{"DOI":"10.13039\/501100003977","name":"Israel Science Foundation","doi-asserted-by":"publisher","award":["2818\/21"],"award-info":[{"award-number":["2818\/21"]}],"id":[{"id":"10.13039\/501100003977","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein\u2019s bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation. To this end, we simulated thousands of realistic phylogenetic trees and the corresponding multiple sequence alignments. Each of the obtained alignments was used to infer the phylogeny using state-of-the-art phylogenetic inference software, which was then compared to the true tree. Using these extensive data, we trained machine-learning algorithms to estimate branch support values for each bipartition within the maximum-likelihood trees obtained by each software. Our results demonstrate that our model provides fast and more accurate probability-based branch support values than commonly used procedures. We demonstrate the applicability of our approach on empirical datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The data supporting this work are available in the Figshare repository at https:\/\/doi.org\/10.6084\/m9.figshare.25050554.v1, and the underlying code is accessible via GitHub at https:\/\/github.com\/noaeker\/bootstrap_repo.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae255","type":"journal-article","created":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:26:33Z","timestamp":1719566793000},"page":"i208-i217","source":"Crossref","is-referenced-by-count":4,"title":["A machine-learning-based alternative to phylogenetic bootstrap"],"prefix":"10.1093","volume":"40","author":[{"given":"Noa","family":"Ecker","sequence":"first","affiliation":[{"name":"The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 6997801, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5047-244X","authenticated-orcid":false,"given":"Doroth\u00e9e","family":"Huchon","sequence":"additional","affiliation":[{"name":"School of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 6997801, Israel"},{"name":"The Steinhardt Museum of Natural History and National Research Center, Tel Aviv University , Tel Aviv 6997801, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6891-2645","authenticated-orcid":false,"given":"Yishay","family":"Mansour","sequence":"additional","affiliation":[{"name":"The Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University , Tel Aviv 6997801, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8460-1502","authenticated-orcid":false,"given":"Itay","family":"Mayrose","sequence":"additional","affiliation":[{"name":"School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 6997801, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9463-2575","authenticated-orcid":false,"given":"Tal","family":"Pupko","sequence":"additional","affiliation":[{"name":"The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University , Tel Aviv 6997801, Israel"}]}],"member":"286","published-online":{"date-parts":[[2024,6,28]]},"reference":[{"key":"2024062809024599100_btae255-B1","doi-asserted-by":"crossref","first-page":"3338","DOI":"10.1093\/molbev\/msaa154","article-title":"ModelTeller: model selection for optimal phylogenetic reconstruction using machine learning","volume":"37","author":"Abadi","year":"2020","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B2","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1093\/sysbio\/syr041","article-title":"Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes","volume":"60","author":"Anisimova","year":"2011","journal-title":"Syst Biol"},{"key":"2024062809024599100_btae255-B3","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1080\/10635150600755453","article-title":"Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative","volume":"55","author":"Anisimova","year":"2006","journal-title":"Syst Biol"},{"key":"2024062809024599100_btae255-B2260909","doi-asserted-by":"publisher","first-page":"1983","DOI":"10.1038\/s41467-021-22073-8","article-title":"Harnessing machine learning to guide phylogenetic-tree search algorithms","volume":"12","author":"Azouri","year":"2021","journal-title":"Nat Commun"},{"key":"2024062809024599100_btae255-B5","doi-asserted-by":"crossref","first-page":"I884","DOI":"10.1093\/bioinformatics\/btaa820","article-title":"Using a GTR+\u0393 substitution model for dating sequence divergence when stationarity and time-reversibility assumptions are violated","volume":"36","author":"Barba-Montoya","year":"2020","journal-title":"Bioinformatics"},{"key":"2024062809024599100_btae255-B6","doi-asserted-by":"crossref","first-page":"107905","DOI":"10.1016\/j.ympev.2023.107905","article-title":"ModelRevelator: fast phylogenetic model estimation via deep learning","volume":"188","author":"Burgstaller-Muehlbacher","year":"2023","journal-title":"Mol Phylogenet Evol"},{"key":"2024062809024599100_btae255-B7","doi-asserted-by":"crossref","first-page":"1616","DOI":"10.1109\/TKDE.2018.2807452","article-title":"A comprehensive survey of graph embedding: problems, techniques, and applications","volume":"30","author":"Cai","year":"2018","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2024062809024599100_btae255-B8","doi-asserted-by":"crossref","first-page":"1506","DOI":"10.1093\/bioinformatics\/btz082","article-title":"Incorporating alignment uncertainty into Felsenstein\u2019s phylogenetic bootstrap to improve its reliability","volume":"37","author":"Chang","year":"2021","journal-title":"Bioinformatics"},{"key":"2024062809024599100_btae255-B9","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1093\/sysbio\/syx096","article-title":"Generalized bootstrap supports for phylogenetic analyses of protein sequences incorporating alignment uncertainty","volume":"67","author":"Chatzou","year":"2018","journal-title":"Syst Biol"},{"key":"2024062809024599100_btae255-B10","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1093\/molbev\/msg042","article-title":"Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability","volume":"20","author":"Douady","year":"2003","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B11","doi-asserted-by":"crossref","first-page":"i118","DOI":"10.1093\/bioinformatics\/btac252","article-title":"A LASSO-based approach to sample sites for phylogenetic tree search","volume":"38","author":"Ecker","year":"2022","journal-title":"Bioinformatics"},{"key":"2024062809024599100_btae255-B13","doi-asserted-by":"crossref","first-page":"13429","DOI":"10.1073\/pnas.93.23.13429","article-title":"Bootstrap confidence levels for phylogenetic trees","volume":"93","author":"Efron","year":"1996","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024062809024599100_btae255-B14","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4899-4541-9","volume-title":"An Introduction to the Bootstrap","author":"Efron","year":"1993"},{"key":"2024062809024599100_btae255-B15","doi-asserted-by":"crossref","first-page":"783","DOI":"10.2307\/2408678","article-title":"Confidence limits on phylogenies: an approach using the bootstrap","volume":"39","author":"Felsenstein","year":"1985","journal-title":"Evolution"},{"key":"2024062809024599100_btae255-B16","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1093\/oxfordjournals.molbev.a025991","article-title":"Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis","volume":"15","author":"Galtier","year":"1998","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B17","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1186\/1471-2105-11-286","article-title":"Conserved residue clusters at protein-protein interfaces and their use in binding site identification","volume":"11","author":"Guharoy","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2024062809024599100_btae255-B18","first-page":"1321","article-title":"On calibration of modern neural networks","volume":"70","author":"Guo","year":"2017","journal-title":"Int Conf Mach Learn"},{"key":"2024062809024599100_btae255-B9735891","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msac254","article-title":"From easy to hopeless-predicting the difficulty of phylogenetic analyses","volume":"39","author":"Haag","year":"2022","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B20","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1089\/cmb.1998.5.149","article-title":"Evolution of DNA or amino acid sequences with dependent sites","volume":"5","author":"Von Haeseler","year":"1998","journal-title":"J Comput Biol"},{"key":"2024062809024599100_btae255-B21","doi-asserted-by":"crossref","first-page":"518","DOI":"10.1093\/molbev\/msx281","article-title":"UFBoot2: improving the ultrafast bootstrap approximation","volume":"35","author":"Hoang","year":"2018","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B22","doi-asserted-by":"crossref","first-page":"1741","DOI":"10.1093\/bioinformatics\/btab863","article-title":"RAxML Grove: an empirical phylogenetic tree database","volume":"38","author":"H\u00f6hler","year":"2022","journal-title":"Bioinformatics"},{"key":"2024062809024599100_btae255-B23","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1214\/ss\/1063994979","article-title":"Bootstrapping phylogenetic trees: theory and methods","volume":"18","author":"Holmes","year":"2003","journal-title":"Stat Sci"},{"key":"2024062809024599100_btae255-B24","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/B978-1-4832-3211-9.50009-7","article-title":"Evolution of protein molecules","volume":"3","author":"Jukes","year":"1969","journal-title":"Mamm Protein Metab"},{"key":"2024062809024599100_btae255-B25","first-page":"3147","article-title":"LightGBM: a highly efficient gradient boosting decision tree","volume":"30","author":"Ke","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2024062809024599100_btae255-B26","doi-asserted-by":"crossref","first-page":"4453","DOI":"10.1093\/bioinformatics\/btz305","article-title":"RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference","volume":"35","author":"Kozlov","year":"2019","journal-title":"Bioinformatics"},{"key":"2024062809024599100_btae255-B27","doi-asserted-by":"crossref","first-page":"4674","DOI":"10.1093\/molbev\/msab227","article-title":"Evolutionary sparse learning for phylogenomics","volume":"38","author":"Kumar","year":"2021","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B28","doi-asserted-by":"crossref","first-page":"1217","DOI":"10.1214\/aos\/1176347265","article-title":"The jackknife and the bootstrap for general stationary observations","volume":"17","author":"Kunsch","year":"1989","journal-title":"Ann Stat"},{"key":"2024062809024599100_btae255-B29","doi-asserted-by":"crossref","first-page":"1380","DOI":"10.1093\/molbev\/msm060","article-title":"Heads or tails: a simple reliability check for multiple sequence alignments","volume":"24","author":"Landan","year":"2007","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B30","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1089\/cmb.2019.0500","article-title":"Incorporating nearest-neighbor site dependence into protein evolution models","volume":"27","author":"Larson","year":"2020","journal-title":"J Comput Biol"},{"key":"2024062809024599100_btae255-B31","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1038\/s41586-018-0043-0","article-title":"Renewing Felsenstein\u2019s phylogenetic bootstrap in the era of big data","volume":"556","author":"Lemoine","year":"2018","journal-title":"Nature"},{"key":"2024062809024599100_btae255-B32","doi-asserted-by":"crossref","first-page":"i216","DOI":"10.1093\/bioinformatics\/bth901","article-title":"A nucleotide substitution model with nearest-neighbour interactions","volume":"20(Suppl 1)","author":"Lunter","year":"2004","journal-title":"Bioinformatics"},{"key":"2024062809024599100_btae255-B0220741","doi-asserted-by":"publisher","DOI":"10.1093\/molbev\/msac092","article-title":"AliSim: A fast and versatile phylogenetic sequence simulator for the genomic era","volume":"39","author":"Ly-Trong","year":"2022","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B1803808","doi-asserted-by":"publisher","first-page":"bpab006","DOI":"10.1093\/biomethods\/bpab006","article-title":"Novel metric for hyperbolic phylogenetic tree embeddings","volume":"6","author":"Matsumoto","year":"2021","journal-title":"Biol Methods Protoc"},{"key":"2024062809024599100_btae255-B35","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1093\/molbev\/mst024","article-title":"Ultrafast approximation for phylogenetic bootstrap","volume":"30","author":"Minh","year":"2013","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B36","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1093\/molbev\/msu300","article-title":"IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies","volume":"32","author":"Nguyen","year":"2015","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B37","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2024062809024599100_btae255-B38","doi-asserted-by":"crossref","first-page":"1983","DOI":"10.1093\/molbev\/msq089","article-title":"Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships","volume":"27","author":"Pick","year":"2010","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B39","doi-asserted-by":"crossref","first-page":"e9490","DOI":"10.1371\/journal.pone.0009490","article-title":"FastTree 2\u2014approximately maximum-likelihood trees for large alignments","volume":"5","author":"Price","year":"2010","journal-title":"PLoS One"},{"key":"2024062809024599100_btae255-B40","doi-asserted-by":"crossref","first-page":"1313","DOI":"10.1098\/rspb.2002.2025","article-title":"A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes","volume":"269","author":"Pupko","year":"2002","journal-title":"Proc Biol Sci"},{"key":"2024062809024599100_btae255-B41","first-page":"8844"},{"key":"2024062809024599100_btae255-B42","doi-asserted-by":"crossref","first-page":"3032","DOI":"10.1093\/bioinformatics\/btab129","article-title":"Bali-Phy version 3: model-based co-estimation of alignment and phylogeny","volume":"37","author":"Redelings","year":"2021","journal-title":"Bioinformatics"},{"key":"2024062809024599100_btae255-B43","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1016\/S0022-5193(05)80104-3","article-title":"The general stochastic model of nucleotide substitution","volume":"142","author":"Rodr\u00edguez","year":"1990","journal-title":"J Theor Biol"},{"key":"2024062809024599100_btae255-B44","doi-asserted-by":"crossref","first-page":"W7","DOI":"10.1093\/nar\/gkv318","article-title":"GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters","volume":"43","author":"Sela","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2024062809024599100_btae255-B45","doi-asserted-by":"crossref","first-page":"1114","DOI":"10.1093\/oxfordjournals.molbev.a026201","article-title":"Multiple comparisons of log-likelihoods with applications to phylogenetic inference","volume":"16","author":"Shimodaira","year":"1999","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B46","doi-asserted-by":"crossref","first-page":"758","DOI":"10.1080\/10635150802429642","article-title":"A rapid bootstrap algorithm for the RAxML web servers","volume":"57","author":"Stamatakis","year":"2008","journal-title":"Syst Biol"},{"key":"2024062809024599100_btae255-B47","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1093\/sysbio\/syz060","article-title":"Accurate inference of tree topologies from multiple sequence alignments using deep learning","volume":"69","author":"Suvorov","year":"2020","journal-title":"Syst Biol"},{"key":"2024062809024599100_btae255-B48","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1038\/s41559-017-0193","article-title":"Phylogenetic rooting using minimal ancestor deviation","volume":"1","author":"Tria","year":"2017","journal-title":"Nat Ecol Evol"},{"key":"2024062809024599100_btae255-B7177227","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"SciPy 1.0: fundamental algorithms for scientific computing in python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat Methods"},{"key":"2024062809024599100_btae255-B49","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1093\/molbev\/msl155","article-title":"Testing for covarion-like evolution in protein sequences","volume":"24","author":"Wang","year":"2007","journal-title":"Mol Biol Evol"},{"key":"2024062809024599100_btae255-B50","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1126\/science.1151532","article-title":"Alignment uncertainty and genomic analysis","volume":"319","author":"Wong","year":"2008","journal-title":"Science"},{"key":"2024062809024599100_btae255-B51","doi-asserted-by":"crossref","first-page":"5358","DOI":"10.1073\/pnas.1909907117","article-title":"A cnidarian parasite of salmon (Myxozoa: Henneguya) lacks a mitochondrial genome","volume":"117","author":"Yahalomi","year":"2020","journal-title":"Proc Natl Acad Sci USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i208\/58355021\/btae255.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i208\/58355021\/btae255.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:27:19Z","timestamp":1719566839000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/Supplement_1\/i208\/7700891"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,28]]},"references-count":51,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2024,6,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae255","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,6,28]]}}}