{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T07:24:22Z","timestamp":1781853862969,"version":"3.54.5"},"reference-count":45,"publisher":"Oxford University Press (OUP)","issue":"20","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Previous studies have shown that accounting for site-specific amino acid replacement patterns using mixtures of stationary probability profiles offers a promising approach for improving the robustness of phylogenetic reconstructions in the presence of saturation. However, such profile mixture models were introduced only in a Bayesian context, and are not yet available in a maximum likelihood (ML) framework. In addition, these mixture models only perform well on large alignments, from which they can reliably learn the shapes of profiles, and their associated weights.<\/jats:p><jats:p>Results: In this work, we introduce an expectation\u2013maximization algorithm for estimating amino acid profile mixtures from alignment databases. We apply it, learning on the HSSP database, and observe that a set of 20 profiles is enough to provide a better statistical fit than currently available empirical matrices (WAG, JTT), in particular on saturated data.<\/jats:p><jats:p>Availability: We have implemented these models into two currently available Bayesian and ML phylogenetic reconstruction programs. The two implementations, PhyloBayes, and PhyML, are freely available on our web site (http:\/\/atgc.lirmm.fr\/cat). They run under Linux and MaxOSX operating systems.<\/jats:p><jats:p>Contact: \u00a0nicolas.lartillot@lirmm.fr<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn445","type":"journal-article","created":{"date-parts":[[2008,8,22]],"date-time":"2008-08-22T00:33:56Z","timestamp":1219365236000},"page":"2317-2323","source":"Crossref","is-referenced-by-count":368,"title":["Empirical profile mixture models for phylogenetic reconstruction"],"prefix":"10.1093","volume":"24","author":[{"given":"Le","family":"Si Quang","sequence":"first","affiliation":[{"name":"M\u00e9thodes et Algorithmes pour la Bioinformatique. LIRMM, CNRS-UM2, 141 rue Ada, 34392 Montpellier Cedex 5, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Olivier","family":"Gascuel","sequence":"additional","affiliation":[{"name":"M\u00e9thodes et Algorithmes pour la Bioinformatique. LIRMM, CNRS-UM2, 141 rue Ada, 34392 Montpellier Cedex 5, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nicolas","family":"Lartillot","sequence":"additional","affiliation":[{"name":"M\u00e9thodes et Algorithmes pour la Bioinformatique. LIRMM, CNRS-UM2, 141 rue Ada, 34392 Montpellier Cedex 5, France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2008,8,21]]},"reference":[{"key":"2023020211260127500_B1","doi-asserted-by":"crossref","first-page":"459","DOI":"10.1007\/BF02498640","article-title":"Model of amino acid substitution in proteins encoded by mitochondrial DNA","volume":"42","author":"Adachi","year":"1996","journal-title":"J. Mol. Evol."},{"key":"2023020211260127500_B2","doi-asserted-by":"crossref","first-page":"348","DOI":"10.1007\/s002399910038","article-title":"Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA","volume":"50","author":"Adachi","year":"2000","journal-title":"J. Mol. Evol."},{"key":"2023020211260127500_B3","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1109\/TAC.1974.1100705","article-title":"A new look at the statistical model identification","volume":"AC-19","author":"Akaike","year":"1974","journal-title":"IEEE Trans. Automat. Control"},{"key":"2023020211260127500_B4","doi-asserted-by":"crossref","first-page":"1152","DOI":"10.1214\/aos\/1176342871","article-title":"Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems","volume":"2","author":"Antoniak","year":"1974","journal-title":"Ann. Stat."},{"key":"2023020211260127500_B5","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1080\/10635150500234609","article-title":"An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics","volume":"54","author":"Brinkmann","year":"2005","journal-title":"Syst. Biol."},{"key":"2023020211260127500_B6","doi-asserted-by":"crossref","first-page":"1368","DOI":"10.1093\/oxfordjournals.molbev.a025583","article-title":"Modeling residue usage in aligned protein sequences via maximum likelihood","volume":"13","author":"Bruno","year":"1996","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B7","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1093\/oxfordjournals.molbev.a026334","article-title":"Selection of conserved blocks from multiple alignment for their use in phylogenetic analysis","volume":"17","author":"Castresana","year":"2000","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B8","doi-asserted-by":"crossref","first-page":"975","DOI":"10.1093\/bioinformatics\/bti109","article-title":"An alternative model of amino-acid replacement","volume":"21","author":"Crooks","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020211260127500_B9","first-page":"345","article-title":"A model of evolutionary change in proteins","volume-title":"Atlas of Protein Sequence and Structure","author":"Dayhoff","year":"1978"},{"key":"2023020211260127500_B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"maximum-likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1997","journal-title":"J. R. Stat. Soc. B"},{"key":"2023020211260127500_B11","doi-asserted-by":"crossref","first-page":"2596","DOI":"10.1093\/bioinformatics\/bti325","article-title":"Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases","volume":"21","author":"Dufayard","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020211260127500_B12","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1093\/oxfordjournals.molbev.a025575","article-title":"A hidden Markov model approach to variation among sites in rate of evolution","volume":"13","author":"Felsenstein","year":"1996","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B13","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1007\/BF01734359","article-title":"Evolutionary trees from DNA sequences: a maximum likelihood approach","volume":"17","author":"Felsenstein","year":"1981","journal-title":"J. Mol. Evol."},{"key":"2023020211260127500_B14","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1214\/aos\/1176342360","article-title":"A Bayesian analysis of some nonparametric problems","volume":"1","author":"Ferguson","year":"1973","journal-title":"Ann. Stat."},{"key":"2023020211260127500_B15","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1093\/oso\/9780199208227.003.0003","article-title":"Modelling the variability of evolutionary processes","volume-title":"Reconstructing Evolution: New Mathematical and Computational Advances","author":"Gascuel","year":"2007"},{"key":"2023020211260127500_B16","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1006\/jmbi.1996.0569","article-title":"Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses","volume":"263","author":"Goldman","year":"1996","journal-title":"J. Mol. Biol."},{"key":"2023020211260127500_B17","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1093\/genetics\/149.1.445","article-title":"Assessing the impact of secondary structure and solvent accessibility on protein evolution","volume":"149","author":"Goldman","year":"1998","journal-title":"Genetics"},{"key":"2023020211260127500_B18","doi-asserted-by":"crossref","first-page":"696","DOI":"10.1080\/10635150390235520","article-title":"A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood","volume":"52","author":"Guindon","year":"2003","journal-title":"Syst. Biol."},{"key":"2023020211260127500_B19","doi-asserted-by":"crossref","first-page":"910","DOI":"10.1093\/oxfordjournals.molbev.a025995","article-title":"Evolutionary distances for protein-coding sequences: modeling site- specific residue frequencies","volume":"15","author":"Halpern","year":"1998","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B20","doi-asserted-by":"crossref","first-page":"753","DOI":"10.1006\/jmbi.2002.5405","article-title":"An expectation maximization algorithm for training hidden substitution models","volume":"317","author":"Holmes","year":"2002","journal-title":"J. Mol. Biol."},{"key":"2023020211260127500_B21","doi-asserted-by":"crossref","first-page":"4338","DOI":"10.1093\/bioinformatics\/bti713","article-title":"Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood","volume":"21","author":"Hordijk","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020211260127500_B22","doi-asserted-by":"crossref","first-page":"975","DOI":"10.1080\/10635150701670569","article-title":"A nonparametric method for accomodating and testing across-site rate variation","volume":"56","author":"Huelsenbeck","year":"2007","journal-title":"Syst. Biol."},{"key":"2023020211260127500_B23","doi-asserted-by":"crossref","first-page":"6263","DOI":"10.1073\/pnas.0508279103","article-title":"A Dirichlet process model for detecting positive selection in protein-coding DNA sequences","volume":"103","author":"Huelsenbeck","year":"2006","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020211260127500_B24","first-page":"275","article-title":"The rapid generation of mutation data matrices from protein sequences","volume":"8","author":"Jones","year":"1992","journal-title":"CABIOS"},{"key":"2023020211260127500_B25","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1146\/annurev.micro.56.012302.160854","article-title":"Microsporidia: biology and evolution of highly reduced intracellular parasites","volume":"59","author":"Keeling","year":"2002","journal-title":"Annu. Rev. Microbiol."},{"key":"2023020211260127500_B26","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1007\/BF02100115","article-title":"Evaluation of the maximum likelihood estimate of the evolutionary tree topology from DNA sequence data, and the branching order in Hominoidea","volume":"29","author":"Kishino","year":"1989","journal-title":"J. Mol. Evol."},{"key":"2023020211260127500_B27","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1002\/(SICI)1097-0134(19980815)32:3<289::AID-PROT4>3.0.CO;2-D","article-title":"Models of natural mutations including site heterogeneity","volume":"32","author":"Koshi","year":"1998","journal-title":"Proteins"},{"key":"2023020211260127500_B28","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1093\/molbev\/msh112","article-title":"A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process","volume":"21","author":"Lartillot","year":"2004","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B29","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1080\/10635150500433722","article-title":"Computing Bayes factors using thermodynamic integration","volume":"55","author":"Lartillot","year":"2006","journal-title":"Syst. Biol."},{"key":"2023020211260127500_B30","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2148-7-S1-S4","article-title":"Suppressing long branch attraction artefacts in the animal phylogeny using a site-heterogeneous model","volume":"7","author":"Lartillot","year":"2007","journal-title":"BMC Evol. Biol."},{"key":"2023020211260127500_B31","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.1093\/molbev\/msn067","article-title":"An improved general amino-acid replacement matrix","volume":"25","author":"Le","year":"2008","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B32","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1080\/10618600.2000.10474879","article-title":"Markov chain sampling methods for Dirichlet process mixture models","volume":"9","author":"Neal","year":"2000","journal-title":"J. Comput. Graph. Stat."},{"key":"2023020211260127500_B33","doi-asserted-by":"crossref","first-page":"729","DOI":"10.1080\/10635150290102393","article-title":"Mapping mutations on phylogenies","volume":"51","author":"Nielsen","year":"2002","journal-title":"Syst. Biol."},{"key":"2023020211260127500_B34","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1080\/10635150490468675","article-title":"A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data","volume":"53","author":"Pagel","year":"2004","journal-title":"Syst. Biol."},{"key":"2023020211260127500_B35","doi-asserted-by":"crossref","first-page":"1246","DOI":"10.1093\/molbev\/msi111","article-title":"Multigene analyses of bilaterian animals corroborate the monophyly of Ecysozoa, Lophotrochozoa and Protostomia","volume":"22","author":"Philippe","year":"2005","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B36","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1002\/prot.340090107","article-title":"Database of homology-derived protein structures and the structural meaning of sequence alignment","volume":"9","author":"Sander","year":"1991","journal-title":"Proteins"},{"key":"2023020211260127500_B37","doi-asserted-by":"crossref","first-page":"562","DOI":"10.1093\/sysbio\/42.4.562","article-title":"The growth of phylogenetic information and the need for a phylogenetic database","volume":"42","author":"Sanderson","year":"1993","journal-title":"Syst. Biol."},{"key":"2023020211260127500_B38","first-page":"461","article-title":"Estimating the dimension of a model","volume":"6","author":"Schwartz","year":"1978","journal-title":"Ann. Stat."},{"key":"2023020211260127500_B39","doi-asserted-by":"crossref","first-page":"1114","DOI":"10.1093\/oxfordjournals.molbev.a026201","article-title":"Multiple comparisons of log-likelihoods with applications to phylogenetic inference","volume":"16","author":"Shimodaira","year":"1999","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B40","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1016\/j.tig.2005.04.001","article-title":"Should phylogenetic models be trying to \u2018fit an elephant\u2019?","volume":"21","author":"Steel","year":"2005","journal-title":"Trends Genet."},{"key":"2023020211260127500_B41","doi-asserted-by":"crossref","first-page":"666","DOI":"10.1093\/oxfordjournals.molbev.a025627","article-title":"Combining protein evolution and secondary structure","volume":"13","author":"Thorne","year":"1996","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B42","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1093\/oxfordjournals.molbev.a003851","article-title":"A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach","volume":"18","author":"Whelan","year":"2001","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260127500_B43","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1016\/S0168-9525(01)02272-7","article-title":"Molecular phylogenetics: state-of-the-art methods for looking into the past","volume":"17","author":"Whelan","year":"2001","journal-title":"Trends Genet."},{"key":"2023020211260127500_B44","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1007\/BF00160154","article-title":"Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods","volume":"39","author":"Yang","year":"1994","journal-title":"J. Mol. Evol."},{"key":"2023020211260127500_B45","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1093\/genetics\/155.1.431","article-title":"Codon-substitution models for heterogeneous selection pressure at amino-acid sites","volume":"155","author":"Yang","year":"2000","journal-title":"Genetics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/20\/2317\/49052408\/bioinformatics_24_20_2317.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/20\/2317\/49052408\/bioinformatics_24_20_2317.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,31]],"date-time":"2025-01-31T16:39:11Z","timestamp":1738341551000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/20\/2317\/260174"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8,21]]},"references-count":45,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2008,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn445","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,10,15]]},"published":{"date-parts":[[2008,8,21]]}}}