{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T04:09:56Z","timestamp":1759205396266,"version":"3.37.3"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2018,12,12]],"date-time":"2018-12-12T00:00:00Z","timestamp":1544572800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Likelihood ratio tests are commonly used to test for positive selection acting on proteins. They are usually applied with thresholds for declaring a protein under positive selection determined from a chi-square or mixture of chi-square distributions. Although it is known that such distributions are not strictly justified due to the statistical irregularity of the problem, the hope has been that the resulting tests are conservative and do not lose much power in comparison with the same test using the unknown, correct threshold. We show that commonly used thresholds need not yield conservative tests, but instead give larger than expected Type I error rates. Statistical regularity can be restored by using a modified likelihood ratio test.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We give theoretical results to prove that, if the number of sites is not too small, the modified likelihood ratio test gives approximately correct Type I error probabilities regardless of the parameter settings of the underlying null hypothesis. Simulations show that modification gives Type I error rates closer to those stated without a loss of power. The simulations also show that parameter estimation for mixture models of codon evolution can be challenging in certain data-generation settings with very different mixing distributions giving nearly identical site pattern distributions unless the number of taxa and tree length are large. Because mixture models are widely used for a variety of problems in molecular evolution, the challenges and general approaches to solving them presented here are applicable in a broader context.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/jehops\/codeml_modl<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty1019","type":"journal-article","created":{"date-parts":[[2018,12,11]],"date-time":"2018-12-11T20:17:35Z","timestamp":1544559455000},"page":"2545-2554","source":"Crossref","is-referenced-by-count":3,"title":["ModL: exploring and restoring regularity when testing for positive selection"],"prefix":"10.1093","volume":"35","author":[{"given":"Joseph","family":"Mingrone","sequence":"first","affiliation":[{"name":"Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada"},{"name":"Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, NS, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Edward","family":"Susko","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada"},{"name":"Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, NS, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joseph P","family":"Bielawski","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada"},{"name":"Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, NS, Canada"},{"name":"Department of Biology, Dalhousie University, Halifax, NS, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2018,12,12]]},"reference":[{"key":"2023062713055034400_bty1019-B1","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1109\/TCBB.2008.52","article-title":"The identifiability of covarion models in phylogenetics","volume":"6","author":"Allman","year":"2009","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinformatics"},{"key":"2023062713055034400_bty1019-B2","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1239\/aap\/1208358894","article-title":"Identifiability of a markovian model of molecular evolution with gamma-distributed rates","volume":"40","author":"Allman","year":"2008","journal-title":"Adv. Appl. Prob"},{"key":"2023062713055034400_bty1019-B3","doi-asserted-by":"crossref","first-page":"1585","DOI":"10.1093\/oxfordjournals.molbev.a003945","article-title":"Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution","volume":"18","author":"Anisimova","year":"2001","journal-title":"Mol. Biol. Evol"},{"key":"2023062713055034400_bty1019-B4","doi-asserted-by":"crossref","first-page":"950","DOI":"10.1093\/oxfordjournals.molbev.a004152","article-title":"Accuracy and power of bayes prediction of amino acid sites under positive selection","volume":"19","author":"Anisimova","year":"2002","journal-title":"Mol. Biol. Evol"},{"key":"2023062713055034400_bty1019-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2148-5-65","article-title":"Testing for adaptive evolution of the female reproductive protein zpc in mammals, birds and fishes reveals problems with the m7-m8 likelihood ratio test","volume":"5","author":"Berlin","year":"2005","journal-title":"BMC Evol. Biol"},{"key":"2023062713055034400_bty1019-B6","volume-title":"Mathematical Statistics: Basic Ideas and Selected Topics","author":"Bickel","year":"2001","edition":"2nd edn."},{"key":"2023062713055034400_bty1019-B7","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1093\/sysbio\/syr023","article-title":"On rogers\u2019 proof of identifiability for the gtr+ \u03b3+ i model","volume":"60","author":"Chai","year":"2011","journal-title":"Syst. Biol"},{"key":"2023062713055034400_bty1019-B8","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1111\/1467-9868.00273","article-title":"A modified likelihood ratio test for homogeneity in finite mixture models","volume":"63","author":"Chen","year":"2001","journal-title":"J. R. Stat. Soc. B"},{"key":"2023062713055034400_bty1019-B9","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1111\/j.1467-9868.2004.00434.x","article-title":"Testing for a finite mixture model with two components","volume":"66","author":"Chen","year":"2004","journal-title":"J. R. Stat. Soc. B"},{"key":"2023062713055034400_bty1019-B10","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1080\/24754269.2017.1321883","article-title":"On finite mixture models","volume":"1","author":"Chen","year":"2017","journal-title":"Stat. Theory Relat. Fields"},{"key":"2023062713055034400_bty1019-B11","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/0378-3758(94)00006-H","article-title":"Asymptotic distribution of the likelihood ratio test that a mixture of two binomials is a single binomial","volume":"43","author":"Chernoff","year":"1995","journal-title":"J. Stat. Plan. Inference"},{"key":"2023062713055034400_bty1019-B12","first-page":"1603","article-title":"Modified likelihood ratio test for homogeneity in a two-sample problem","volume":"19","author":"Fu","year":"2009","journal-title":"Stat. Sin"},{"key":"2023062713055034400_bty1019-B13","doi-asserted-by":"crossref","first-page":"2655","DOI":"10.1093\/bioinformatics\/btr470","article-title":"A phylogenetic mixture model for the identification of functionally divergent protein residues","volume":"27","author":"Gaston","year":"2011","journal-title":"Bioinformatics"},{"key":"2023062713055034400_bty1019-B14","first-page":"725","article-title":"A codon-based model of nucleotide substitution for protein-coding DNA sequences","volume":"11","author":"Goldman","year":"1994","journal-title":"Mol. Biol. Evol"},{"key":"2023062713055034400_bty1019-B15","first-page":"807","article-title":"A failure of likelihood asymptotics for normal mixtures","author":"Hartigan","year":"1985","journal-title":"Proceedings of the Berkeley Conference in Honor of J Neyman and J Kiefer"},{"key":"2023062713055034400_bty1019-B16","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1093\/molbev\/msh112","article-title":"A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process","volume":"21","author":"Lartillot","year":"2004","journal-title":"Mol. Biol. Evol"},{"key":"2023062713055034400_bty1019-B17","doi-asserted-by":"crossref","first-page":"2976","DOI":"10.1093\/molbev\/msw160","article-title":"Smoothed bootstrap aggregation for assessing selection pressure at amino acid sites","volume":"33","author":"Mingrone","year":"2016","journal-title":"Mol. Biol. Evol"},{"key":"2023062713055034400_bty1019-B18","doi-asserted-by":"crossref","first-page":"929","DOI":"10.1093\/genetics\/148.3.929","article-title":"Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene","volume":"148","author":"Nielsen","year":"1998","journal-title":"Genetics"},{"key":"2023062713055034400_bty1019-B19","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1080\/10635150490468675","article-title":"A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data","volume":"53","author":"Pagel","year":"2004","journal-title":"Syst. Biol"},{"key":"2023062713055034400_bty1019-B20","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1093\/gbe\/evp012","article-title":"Estimates of positive Darwinian selection are inflated by errors in sequencing, annotation, and alignment","volume":"1","author":"Schneider","year":"2009","journal-title":"Genome Biol. Evol"},{"key":"2023062713055034400_bty1019-B21","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1080\/01621459.1987.10478472","article-title":"Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions","volume":"82","author":"Self","year":"1987","journal-title":"J. Am. Stat. Assoc"},{"key":"2023062713055034400_bty1019-B22","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1093\/molbev\/msh098","article-title":"False-positive selection identified by ML-based methods: examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus","volume":"21","author":"Suzuki","year":"2004","journal-title":"Mol. Biol. Evol"},{"key":"2023062713055034400_bty1019-B23","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1093\/oxfordjournals.molbev.a004233","article-title":"Pervasive adaptive evolution in mammalian fertilization proteins","volume":"20","author":"Swanson","year":"2003","journal-title":"Mol. Biol. Evol"},{"key":"2023062713055034400_bty1019-B24","doi-asserted-by":"crossref","first-page":"1280","DOI":"10.1038\/s41559-018-0584-5","article-title":"Multinucleotide mutations cause false inferences of lineage-specific positive selection","volume":"2","author":"Venkat","year":"2018","journal-title":"Nat. Ecol. Evol"},{"key":"2023062713055034400_bty1019-B25","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1186\/1471-2148-8-331","article-title":"A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny","volume":"8","author":"Wang","year":"2008","journal-title":"BMC Evol. Biol"},{"key":"2023062713055034400_bty1019-B26","doi-asserted-by":"crossref","first-page":"1041","DOI":"10.1534\/genetics.104.031153","article-title":"Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites","volume":"168","author":"Wong","year":"2004","journal-title":"Genetics"},{"key":"2023062713055034400_bty1019-B27","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1093\/genetics\/155.1.431","article-title":"Codon-substitution models for heterogeneous selection pressure at amino acid sites","volume":"155","author":"Yang","year":"2000","journal-title":"Genetics"},{"key":"2023062713055034400_bty1019-B28","doi-asserted-by":"crossref","first-page":"1446","DOI":"10.1093\/oxfordjournals.molbev.a026245","article-title":"Maximum-likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites","volume":"17","author":"Yang","year":"2000","journal-title":"Mol. Biol. Evol"},{"key":"2023062713055034400_bty1019-B29","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1093\/molbev\/msi097","article-title":"Bayes empirical bayes inference of amino acid sites under positive selection","volume":"22","author":"Yang","year":"2005","journal-title":"Mol. Biol. Evol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/15\/2545\/50722510\/bioinformatics_35_15_2545.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/15\/2545\/50722510\/bioinformatics_35_15_2545.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T13:06:15Z","timestamp":1687871175000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/15\/2545\/5239994"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2018,12,12]]},"references-count":29,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2019,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty1019","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2019,8,1]]},"published":{"date-parts":[[2018,12,12]]}}}