{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,8,7]],"date-time":"2023-08-07T13:19:11Z","timestamp":1691414351503},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need to determine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes. Such unusual gene trees may have been influenced by other evolutionary processes such as selection, gene duplication, or horizontal gene transfer.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributions of gene trees, and we developed the software  to estimate a p-value for the test. Our approach maps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) to measure the separation between two sets of pre-defined trees. We use a permutation test to assess the significance of the SVM separation. To demonstrate the performance of , we applied it to the comparison of gene trees simulated within different species trees across a range of species tree depths. Applied directly to sets of simulated gene trees with large sample sizes,  was able to detect very small differences between two set of gene trees generated under different species trees. Our statistical test can also include tree reconstruction into its test framework through a variety of phylogenetic optimality criteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the form of receiver operating characteristic (ROC) curves indicated that  performed well in the detection of differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, it controlled false positive and false negative rates very well, indicating a high degree of accuracy.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it an applicable test for any scenario where evolutionary or other factors can lead to trees with different multi-dimensional distributions. The software  is freely available under the GNU public license.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-13-210","type":"journal-article","created":{"date-parts":[[2012,8,21]],"date-time":"2012-08-21T10:14:11Z","timestamp":1345544051000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A support vector machine based test for incongruence between sets of trees in tree space"],"prefix":"10.1186","volume":"13","author":[{"given":"David C","family":"Haws","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Huggins","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eric M","family":"O\u2019Neill","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David W","family":"Weisrock","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruriko","family":"Yoshida","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2012,8,21]]},"reference":[{"key":"5551_CR1","doi-asserted-by":"publisher","first-page":"221","DOI":"10.2307\/2408332","volume":"37","author":"AR Templeton","year":"1983","unstructured":"Templeton AR: Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 1983, 37: 221\u2013244. 10.2307\/2408332","journal-title":"Evolution"},{"key":"5551_CR2","doi-asserted-by":"publisher","first-page":"652","DOI":"10.1080\/106351500750049752","volume":"49","author":"N Goldman","year":"2000","unstructured":"Goldman N, Anderson JP, Rodrigo AG: Likelihood-based tests of topologies in phylogenetics. Syst Biol 2000, 49: 652\u2013670. 10.1080\/106351500750049752","journal-title":"Syst Biol"},{"key":"5551_CR3","doi-asserted-by":"publisher","first-page":"546","DOI":"10.1093\/sysbio\/45.4.546","volume":"45","author":"JP Huelsenbeck","year":"1996","unstructured":"Huelsenbeck JP, Hillis DM, Nielsen R: A likelihood-ratio test of monophyly. Syst Biol 1996, 45: 546\u2013558. 10.1093\/sysbio\/45.4.546","journal-title":"Syst Biol"},{"key":"5551_CR4","doi-asserted-by":"publisher","first-page":"412","DOI":"10.1093\/molbev\/msl170","volume":"24","author":"C An\u00e9","year":"2007","unstructured":"An\u00e9 C, Larget B, Baum DA, Smith SD, Rokas A: Bayesian estimation of concordance among gene trees. Mol Biol Evol 2007, 24: 412\u2013426.","journal-title":"Mol Biol Evol"},{"key":"5551_CR5","unstructured":"Wilgenbusch JC, Warren DL, Swofford DL: AWTY: A system for graphical exploration of MCMC convergence in Bayesian phylogenetic inference. [http:\/\/ceb.csit.fsu.edu\/awty2004] []"},{"issue":"3","key":"5551_CR6","doi-asserted-by":"publisher","first-page":"471","DOI":"10.1080\/10635150590946961","volume":"54","author":"DM Hillis","year":"2005","unstructured":"Hillis DM, Heath TA, St. John K: Analysis and visualization of tree space. Syst Biol 2005, 54(3):471\u2013482. 10.1080\/10635150590946961","journal-title":"Syst Biol"},{"key":"5551_CR7","doi-asserted-by":"crossref","unstructured":"Arnaoudova E, Haws D, Huggins P, Jaromczyk JW, Moore N, Schardl C, Yoshida R: Statistical phylogenetic tree analysis using differences of means. Front Psychiatry 2010., 1(47):","DOI":"10.3389\/fnins.2010.00047"},{"key":"5551_CR8","doi-asserted-by":"publisher","first-page":"1615","DOI":"10.1093\/molbev\/mss008","volume":"29","author":"DW Weisrock","year":"2012","unstructured":"Weisrock DW, Smith SD, Chan LM, Biebouw K, Kappeler PM, Yoder AD: Concatenation and concordance in the reconstruction of mouse lemur phylogeny: An empirical demonstration of the effect of allele sampling in phylogenetics. Molecular Biology and Evolution 2012, 29: 1615\u201330. 10.1093\/molbev\/mss008","journal-title":"Molecular Biology and Evolution"},{"key":"5551_CR9","doi-asserted-by":"publisher","first-page":"1565","DOI":"10.1038\/nbt1206-1565","volume":"24","author":"W Noble","year":"2006","unstructured":"Noble W: What is a support vector machine? Nature Biotech 2006, 24: 1565\u20131567. 10.1038\/nbt1206-1565","journal-title":"Nature Biotech"},{"key":"5551_CR10","volume-title":"Oxford lecture series in mathematics and its applications","author":"C Semple","year":"2003","unstructured":"Semple C, Steel M: Oxford lecture series in mathematics and its applications. Vol. 24. London, United Kingdom: Oxford University Press; 2003. xiv+239 xiv+239"},{"key":"5551_CR11","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1057\/ivs.2009.29","volume":"9","author":"M Graham","year":"2010","unstructured":"Graham M, Kennedy J: A survey of multiple tree visualisation. Inf Visualization 2010, 9: 235\u2013252. 10.1057\/ivs.2009.29","journal-title":"Inf Visualization"},{"key":"5551_CR12","doi-asserted-by":"publisher","first-page":"972","DOI":"10.1080\/10635150601089001","volume":"55","author":"AB Smythe","year":"2006","unstructured":"Smythe AB, Sanderson MJ, Nadler SA: Nematode small subunit phylogeny correlates with alignment parameters. Syst Biol 2006, 55: 972\u2013992. 10.1080\/10635150601089001","journal-title":"Syst Biol"},{"key":"5551_CR13","volume-title":"Statistical Approach to Tests Involving Phylogenies","author":"S Holmes","year":"2007","unstructured":"Holmes S: Statistical Approach to Tests Involving Phylogenies. New York, NY,USA: Oxford University Press, USA; 2007."},{"key":"5551_CR14","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-4286-2","volume-title":"Statistical Decision Theory and Bayesian Analysis","author":"J Berger","year":"1985","unstructured":"Berger J: Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag; 1985."},{"key":"5551_CR15","volume-title":"The Recovery of Trees from Measures of Dissimilarity","author":"P Buneman","year":"1971","unstructured":"Buneman P: The Recovery of Trees from Measures of Dissimilarity. Midlothian, United Kingdom: Edinburgh University Press; 1971."},{"key":"5551_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1086\/284325","volume":"125","author":"J Felsenstein","year":"1985","unstructured":"Felsenstein J: Phylogenies and the comparative method. Am Naturalist 1985, 125: 1\u201315. 10.1086\/284325","journal-title":"Am Naturalist"},{"key":"5551_CR17","doi-asserted-by":"publisher","first-page":"168","DOI":"10.1016\/j.jmaa.2010.05.001","volume":"371","author":"A Mir","year":"2010","unstructured":"Mir A, Rossello F: The mean value of the squared path-difference distance for rooted phylogenetic trees. J Math Anal Appl 2010, 371: 168\u2013176. 10.1016\/j.jmaa.2010.05.001","journal-title":"J Math Anal Appl"},{"key":"5551_CR18","doi-asserted-by":"crossref","unstructured":"Golland P, Liang F, Mukherjee S, Panchenko DIn Proc. COLT: Annual Conference on Learning Theory, LNCS; 2005:501\u2013515. vol. 3559. In Proc. COLT: Annual Conference on Learning Theory, LNCS; 2005:501\u2013515. vol. 3559.","DOI":"10.1007\/11503415_34"},{"key":"5551_CR19","volume-title":"Introduction to Stochastic Processes 2nd ed","author":"G Lawler","year":"2000","unstructured":"Lawler G: Introduction to Stochastic Processes 2nd ed. NY: Chapman & Hall\/CRC; 2000."},{"key":"5551_CR20","unstructured":"Maddison WP, Maddison D: Mesquite: a modular system for evolutionary analysis. http:\/\/mesquiteproject.org"},{"issue":"2","key":"5551_CR21","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1109\/34.908974","volume":"23","author":"A Martinez","year":"2001","unstructured":"Martinez A, Kak A: PCA versus LDA. Pattern Analysis and Machine Intelligence, IEEE Transactions on 2001, 23(2):228\u2013233. 10.1109\/34.908974","journal-title":"Pattern Analysis and Machine Intelligence, IEEE Transactions on"},{"key":"5551_CR22","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1007\/BF02101694","volume":"22","author":"M Hasegawa","year":"1985","unstructured":"Hasegawa M, Kishino H, Yano T: Dating the human-ape split by a molecular clock of mitochondrial DNA. J Mol Evolution 1985, 22: 160\u2013174. 10.1007\/BF02101694","journal-title":"J Mol Evolution"},{"key":"5551_CR23","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1093\/genetics\/139.2.993","volume":"139","author":"Z Yang","year":"1995","unstructured":"Yang Z: A space-time process model for the evolution of DNA sequences. Genetics 1995, 139: 993\u20131005.","journal-title":"Genetics"},{"key":"5551_CR24","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1080\/10635150500354928","volume":"55","author":"W Maddison","year":"2006","unstructured":"Maddison W, Knowles L: Inferring phylogeny despite incomplete lineage sorting. Syst Biol 2006, 55: 21\u201330. 10.1080\/10635150500354928","journal-title":"Syst Biol"},{"key":"5551_CR25","doi-asserted-by":"publisher","first-page":"16","DOI":"10.2307\/2408542","volume":"38","author":"J Felsenstein","year":"1984","unstructured":"Felsenstein J: Distance methods for inferring phylogenies: A justification. Evolution 1984, 38: 16\u201324. 10.2307\/2408542","journal-title":"Evolution"},{"key":"5551_CR26","unstructured":"Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by author. Department of Genome Sciences University of Washington, Seattle. 2005."},{"key":"5551_CR27","doi-asserted-by":"publisher","first-page":"696","DOI":"10.1080\/10635150390235520","volume":"52","author":"S Guindon","year":"2003","unstructured":"Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 2003, 52: 696\u2013704. 10.1080\/10635150390235520","journal-title":"Syst Biol"},{"key":"5551_CR28","doi-asserted-by":"publisher","first-page":"754","DOI":"10.1093\/bioinformatics\/17.8.754","volume":"17","author":"J Huelsenbeck","year":"2001","unstructured":"Huelsenbeck J, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 2001, 17: 754\u2013755. 10.1093\/bioinformatics\/17.8.754","journal-title":"Bioinformatics"},{"key":"5551_CR29","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","volume":"27","author":"T Fawcett","year":"2006","unstructured":"Fawcett T: An introduction to ROC analysis. Pattern Recognit Lett 2006, 27: 861\u2013874. 10.1016\/j.patrec.2005.10.010","journal-title":"Pattern Recognit Lett"},{"key":"5551_CR30","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1093\/clinchem\/39.4.561","volume":"39","author":"M Zweig","year":"1993","unstructured":"Zweig M, Campbell G: Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 1993, 39: 561\u2013577.","journal-title":"Clin Chem"},{"key":"5551_CR31","volume-title":"The R FAQ","author":"K Hornik","year":"2011","unstructured":"Hornik K: The R FAQ. 2011.http:\/\/CRAN.R-project.org\/doc\/FAQ\/R-FAQ.html []"},{"key":"5551_CR32","doi-asserted-by":"publisher","first-page":"246","DOI":"10.1093\/gbe\/evr013","volume":"3","author":"C An\u00e9","year":"2011","unstructured":"An\u00e9 C: Detecting phylogenetic breakpoints and discordance from genome-wide alignments for species tree reconstruction. Genome Biol and Evolution 2011, 3: 246\u2013258. 10.1093\/gbe\/evr013","journal-title":"Genome Biol and Evolution"},{"key":"5551_CR33","volume-title":"Sas for Linear Models","author":"R Littell","year":"2002","unstructured":"Littell R, Stroup W, Freund R: Sas for Linear Models. 4th edition. Cary: SAS Institute, Inc.; 2002 4th edition. Cary: SAS Institute, Inc.; 2002","edition":"4"},{"key":"5551_CR34","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1016\/0025-5564(81)90043-2","volume":"53","author":"DR Robinson","year":"1981","unstructured":"Robinson DR, Foulds LR: Comparison of phylogenetic trees. Math Biosci 1981, 53: 131\u2013147. 10.1016\/0025-5564(81)90043-2","journal-title":"Math Biosci"},{"issue":"2","key":"5551_CR35","doi-asserted-by":"publisher","first-page":"193","DOI":"10.2307\/2413326","volume":"34","author":"GF Estabrook","year":"1985","unstructured":"Estabrook GF, McMorris FR, Meacham CA: Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst Zool 1985, 34(2):193\u2013200. 10.2307\/2413326","journal-title":"Syst Zool"},{"key":"5551_CR36","first-page":"19","volume-title":"Molecular zoology: Advances, strategies, and protocols","author":"J Hulesenbeck","year":"1996","unstructured":"Hulesenbeck J, Hillis DM, Jones R: Parametric boostrapping in molecular phylogenetics: Application and performance. In Molecular zoology: Advances, strategies, and protocols. Edited by: Ferraris J, Palumbi S. New York: Wiley-Liss; 1996:19\u201345."},{"issue":"12","key":"5551_CR37","doi-asserted-by":"publisher","first-page":"496","DOI":"10.1016\/S0169-5347(00)01994-7","volume":"15","author":"Z Yang","year":"2000","unstructured":"Yang Z, Bielawski J: Statistical methods for detecting molecular adaptation. Trends Ecol Evol 2000, 15(12):496\u2013503. 10.1016\/S0169-5347(00)01994-7","journal-title":"Trends Ecol Evol"},{"key":"5551_CR38","doi-asserted-by":"publisher","first-page":"1891","DOI":"10.1093\/molbev\/msl051","volume":"23","author":"L Sergei","year":"2006","unstructured":"Sergei L, Kosakovsky P, Posada D, Gravenor MB, Woelk CH, Frost SDW: Automated phylogenetic detection of recombination using a genetic algorithm. Mol Biol Evol 2006, 23: 1891\u20131901. 10.1093\/molbev\/msl051","journal-title":"Mol Biol Evol"},{"issue":"3","key":"5551_CR39","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1080\/10618600.2012.640901","volume":"21","author":"J Chakerian","year":"2012","unstructured":"Chakerian J, Holmes S: Computational tools for evaluating phylogenetic and hierarchical clustering trees. Journal of Computational and Graphical Statistics 2012, 21(3):581\u2013599. 10.1080\/10618600.2012.640901","journal-title":"Journal of Computational and Graphical Statistics"},{"key":"5551_CR40","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1093\/bioinformatics\/18.suppl_1.S285","volume":"18","author":"C Stockham","year":"2002","unstructured":"Stockham C, Wang L, Warnow T: Statistically-based postprocessing of phylogenetic analysis using clustering. Bioinformatics 2002, 18: 285\u2013293. 10.1093\/bioinformatics\/18.suppl_1.S285","journal-title":"Bioinformatics"},{"issue":"4","key":"5551_CR41","doi-asserted-by":"publisher","first-page":"590","DOI":"10.1093\/sysbio\/46.4.590","volume":"46","author":"D Maddison","year":"1997","unstructured":"Maddison D, Swofford D, Maddison W: NEXUS: an extensible file format for systematic information. Syst Biol 1997, 46(4):590\u2013621. 10.1093\/sysbio\/46.4.590","journal-title":"Syst Biol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-13-210.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T21:09:14Z","timestamp":1630530554000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-13-210"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,8,21]]},"references-count":41,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["5551"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-13-210","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,8,21]]},"assertion":[{"value":"20 November 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 May 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 August 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"210"}}