{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T09:05:24Z","timestamp":1775639124581,"version":"3.50.1"},"reference-count":55,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2022,5,17]],"date-time":"2022-05-17T00:00:00Z","timestamp":1652745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fonds de Recherche sur la Sant\u00e9 of Qu\u00e9bec and Fonds de Recherche sur la Nature et Technologies of Qu\u00e9bec"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,6,27]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Each gene has its own evolutionary history which can substantially differ from evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. However, the output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some remarkable properties of the Robinson and Foulds distance, can be used to partition a given set of trees into one (for homogeneous data) or multiple (for heterogeneous data) cluster(s) of trees. Moreover, we adapt the popular Cali\u0144ski\u2013Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. Special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus better suited for analyzing large evolutionary datasets.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Our KMeansSuperTreeClustering program along with its C++ source code is available at: https:\/\/github.com\/TahiriNadia\/KMeansSuperTreeClustering.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac326","type":"journal-article","created":{"date-parts":[[2022,5,10]],"date-time":"2022-05-10T15:20:05Z","timestamp":1652196005000},"page":"3367-3376","source":"Crossref","is-referenced-by-count":10,"title":["Building alternative consensus trees and supertrees using\n                    <i>k<\/i>\n                    -means and Robinson and Foulds distance"],"prefix":"10.1093","volume":"38","author":[{"given":"Nadia","family":"Tahiri","sequence":"first","affiliation":[{"name":"D\u00e9partement d\u2019informatique, Universit\u00e9 du Qu\u00e9bec \u00e0 Montr\u00e9al , Montreal, QC H2X 3Y7, Canada"},{"name":"D\u00e9partement d\u2019informatique, Universit\u00e9 de Sherbrooke , Sherbrooke, QC J1K 2X9, Canada"}]},{"given":"Bernard","family":"Fichet","sequence":"additional","affiliation":[{"name":"Facult\u00e9 de M\u00e9decine, Aix-Marseille Universit\u00e9 , Marseille F-13385, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3753-5925","authenticated-orcid":false,"given":"Vladimir","family":"Makarenkov","sequence":"additional","affiliation":[{"name":"D\u00e9partement d\u2019informatique, Universit\u00e9 du Qu\u00e9bec \u00e0 Montr\u00e9al , Montreal, QC H2X 3Y7, Canada"}]}],"member":"286","published-online":{"date-parts":[[2022,5,17]]},"reference":[{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1002\/bs.3830120210","article-title":"A clustering technique for summarizing multivariate data","volume":"12","author":"Ball","year":"1967","journal-title":"Behav. Sci"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/1748-7188-5-18","article-title":"Robinson-Foulds supertrees","volume":"5","author":"Bansal","year":"2010","journal-title":"Algorithms Mol. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1016\/j.tim.2004.07.002","article-title":"Phylogenetic reconstruction and lateral gene transfer","volume":"12","author":"Bapteste","year":"2004","journal-title":"Trends Microbiol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1007\/BF01894194","article-title":"The median procedure for n-trees","volume":"3","author":"Barth\u00e9lemy","year":"1986","journal-title":"J. Classif"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"3","DOI":"10.2307\/1222480","article-title":"Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees","volume":"41","author":"Baum","year":"1992","journal-title":"Taxon"},{"key":"2023041408000551600_","volume-title":"Algorithms - ESA\u201999. Lecture Notes in Computer Science","author":"Berry","year":"1999"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1006\/aama.2001.0759","article-title":"Geometry of the space of phylogenetic trees","volume":"27","author":"Billera","year":"2001","journal-title":"Adv. Appl. Math"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4020-2330-9","volume-title":"Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life","author":"Bininda-Emonds","year":"2004"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1093\/sysbio\/syp103","article-title":"Inferring and validating horizontal gene transfer events using bipartition dissimilarity","volume":"59","author":"Boc","year":"2010","journal-title":"Syst. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"W573","DOI":"10.1093\/nar\/gks485","article-title":"T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks","volume":"40","author":"Boc","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1080\/10635150600969880","article-title":"Multipolar consensus for phylogenetic trees","volume":"55","author":"Bonnard","year":"2006","journal-title":"Syst. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1007\/s00026-004-0229-z","article-title":"On the computational complexity of the rooted subtree prune and regraft distance","volume":"8","author":"Bordewich","year":"2005","journal-title":"Ann. Comb"},{"key":"2023041408000551600_","first-page":"285","author":"Bryant","year":"2000"},{"key":"2023041408000551600_","first-page":"163","volume-title":"DIMACS Series in Discrete Mathematics and Theoretical Computer Science","author":"Bryant","year":"2003"},{"key":"2023041408000551600_","first-page":"387","volume-title":"Mathematics and the Archeological and Historical Sciences","author":"Buneman","year":"1971"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1080\/03610927408827101","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Calinski","year":"1974","journal-title":"Commun. Stat. Theory Methods"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1093\/bioinformatics\/bti020","article-title":"Clann: investigating phylogenetic information through supertree analyses","volume":"21","author":"Creevey","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1007\/978-1-4612-2686-4_2","volume-title":"Classification and Dissimilarity Analysis","author":"Critchley","year":"1994"},{"key":"2023041408000551600_","first-page":"4","article-title":"The supermatrix approach to systematics","volume":"22","author":"de Queiroz","year":"2007","journal-title":"Trends Ecol. Evol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"1250004","DOI":"10.1142\/S0219720012500047","article-title":"Quartets and unrooted phylogenetic networks","volume":"10","author":"Gambette","year":"2012","journal-title":"J. Bioinform. Comput. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"1773","DOI":"10.1007\/s11538-016-0199-4","article-title":"Do branch lengths help to locate a tree in a phylogenetic network?","volume":"78","author":"Gambette","year":"2016","journal-title":"Bull. Math. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1093\/oso\/9780198566106.001.0001","volume-title":"Mathematics of Evolution and Phylogeny","author":"Gascuel","year":"2005"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1186\/1471-2105-14-46","article-title":"Multiple consensus trees: a method to separate divergent genes","volume":"14","author":"Gu\u00e9noche","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/S0166-218X(96)00062-5","article-title":"On the complexity of comparing evolutionary trees","volume":"71","author":"Hein","year":"1996","journal-title":"Discrete Appl. Math"},{"key":"2023041408000551600_","first-page":"88","volume-title":"Annual International Conference on Research in Computational Molecular Biology","author":"Jansson","year":"2013"},{"key":"2023041408000551600_","first-page":"459","article-title":"A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates","volume":"11","author":"Kuhner","year":"1994","journal-title":"Mol. Biol. Evol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1038\/s41586-020-2169-0","article-title":"Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins","volume":"583","author":"Lam","year":"2020","journal-title":"Nature"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"e29903","DOI":"10.1371\/journal.pone.0029903","article-title":"Armadillo 1.1: an original workflow platform for designing and conducting phylogenetic analysis and simulations","volume":"7","author":"Lord","year":"2012","journal-title":"PLoS One"},{"key":"2023041408000551600_","first-page":"281","author":"MacQueen","year":"1967"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1093\/sysbio\/40.3.315","article-title":"The discovery and importance of multiple islands of most-parsimonious trees","volume":"40","author":"Maddison","year":"1991","journal-title":"Syst. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"19","DOI":"10.11646\/zootaxa.1668.1.4","article-title":"The tree of life web project","volume":"1668","author":"Maddison","year":"2007","journal-title":"Zootaxa"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1007\/978-3-642-00202-1_24","article-title":"The planar k-means problem is NP-hard","volume":"5431","author":"Mahajan","year":"2009","journal-title":"Lect. Notes Comput. Sci"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1089\/106652701446170","article-title":"Comparison of additive trees using circular orders","volume":"7","author":"Makarenkov","year":"2000","journal-title":"J. Comput. Biol"},{"key":"2023041408000551600_","first-page":"1","article-title":"Horizontal gene transfer and recombination analysis of SARS-CoV-2 genes helps discover its close relatives and shed light on its origin","volume":"21","author":"Makarenkov","year":"2021","journal-title":"BMC Ecol. Evol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1093\/sysbio\/syq091","article-title":"Conservative supertrees","volume":"60","author":"McMorris","year":"2011","journal-title":"Syst. Biol"},{"key":"2023041408000551600_","author":"McMorris","year":"1983"},{"key":"2023041408000551600_","first-page":"1145","article-title":"Penalized model-based clustering with application to variable selection","volume":"8","author":"Pan","year":"2007","journal-title":"J. Mach. Learn. Res"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.meegid.2014.12.022","article-title":"Recombination in viruses: mechanisms, methods of study, and evolutionary consequences","volume":"30","author":"P\u00e9rez-Losada","year":"2015","journal-title":"Infect. Genet. Evol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/1055-7903(92)90035-F","article-title":"Phylogenetic inference based on matrix representation of trees","volume":"1","author":"Ragan","year":"1992","journal-title":"Mol. Phylogenet. Evol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/0025-5564(81)90043-2","article-title":"Comparison of phylogenetic trees","volume":"53","author":"Robinson","year":"1981","journal-title":"Math. Biosci"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1186\/s12864-019-6395-5","article-title":"Detecting horizontal gene transfer: a probabilistic approach","volume":"21","author":"Sevillya","year":"2020","journal-title":"BMC Genomics"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/sysbio\/syab015","article-title":"On defining and finding islands of trees and mitigating large island bias","volume":"70","author":"Silva","year":"2021","journal-title":"Syst. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"704","DOI":"10.1109\/TCBB.2008.133","article-title":"Quartets MaxCut: a divide and conquer quartets algorithm","volume":"7","author":"Snir","year":"2010","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinf"},{"key":"2023041408000551600_","first-page":"e83","article-title":"The shape of phylogenetic treespace","volume":"66","author":"St. John","year":"2017","journal-title":"Syst. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1007\/s00357-007-0003-0","article-title":"Initializing k-means batch clustering: a critical evaluation of several techniques","volume":"24","author":"Steinley","year":"2007","journal-title":"J. Classif"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"S285","DOI":"10.1093\/bioinformatics\/18.suppl_1.S285","article-title":"Statistically based postprocessing of phylogenetic analysis by clustering","volume":"18","author":"Stockham","year":"2002","journal-title":"Bioinformatics"},{"key":"2023041408000551600_","first-page":"793","article-title":"An experimental analysis of Robinson-Foulds distance matrix algorithms","author":"Sul","year":"2008"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1186\/s12862-018-1163-8","article-title":"A new fast method for inferring multiple consensus trees using k-medoids","volume":"18","author":"Tahiri","year":"2018","journal-title":"BMC Evol. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J. R. Stat. Soci. B"},{"key":"2023041408000551600_","author":"Wareham","year":"1985"},{"key":"2023041408000551600_","author":"Warnow","year":"2018"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1093\/sysbio\/syu023","article-title":"Supertrees based on the subtree prune-and-regraft distance","volume":"63","author":"Whidden","year":"2014","journal-title":"Syst. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1080\/10635150701245370","article-title":"Properties of supertree methods in the consensus setting","volume":"56","author":"Wilkinson","year":"2007","journal-title":"Syst. Biol"},{"key":"2023041408000551600_","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1093\/gbe\/evw065","article-title":"Simulating and summarizing sources of gene tree incongruence","volume":"8","author":"Woodhams","year":"2016","journal-title":"Genome Biol. Evol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac326\/43935897\/btac326.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/13\/3367\/49884269\/btac326.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/13\/3367\/49884269\/btac326.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,21]],"date-time":"2023-11-21T04:44:23Z","timestamp":1700541863000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/13\/3367\/6586801"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,5,17]]},"references-count":55,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2022,6,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac326","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.03.24.436812","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,7,1]]},"published":{"date-parts":[[2022,5,17]]}}}