{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T04:07:23Z","timestamp":1776312443908,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"20","license":[{"start":{"date-parts":[[2020,7,3]],"date-time":"2020-07-03T00:00:00Z","timestamp":1593734400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,12,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The Robinson\u2013Foulds (RF) metric is widely used by biologists, linguists and chemists to quantify similarity between pairs of phylogenetic trees. The measure tallies the number of bipartition splits that occur in both trees\u2014but this conservative approach ignores potential similarities between almost-identical splits, with undesirable consequences. \u2018Generalized\u2019 RF metrics address this shortcoming by pairing splits in one tree with similar splits in the other. Each pair is assigned a similarity score, the sum of which enumerates the similarity between two trees. The challenge lies in quantifying split similarity: existing definitions lack a principled statistical underpinning, resulting in misleading tree distances that are difficult to interpret. Here, I propose probabilistic measures of split similarity, which allow tree similarity to be measured in natural units (bits).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>My new information-theoretic metrics outperform alternative measures of tree similarity when evaluated against a broad suite of criteria, even though they do not account for the non-independence of splits within a single tree. Mutual clustering information exhibits none of the undesirable properties that characterize other tree comparison metrics, and should be preferred to the RF metric.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The methods discussed in this article are implemented in the R package \u2018TreeDist\u2019, archived at https:\/\/dx.doi.org\/10.5281\/zenodo.3528123.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa614","type":"journal-article","created":{"date-parts":[[2020,6,26]],"date-time":"2020-06-26T22:45:46Z","timestamp":1593211546000},"page":"5007-5013","source":"Crossref","is-referenced-by-count":174,"title":["Information theoretic generalized Robinson\u2013Foulds metrics for comparing phylogenetic trees"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5660-1727","authenticated-orcid":false,"given":"Martin R","family":"Smith","sequence":"first","affiliation":[{"name":"Department of Earth Sciences, Lower Mountjoy, Durham University , Durham DH1 3LE, UK"}]}],"member":"286","published-online":{"date-parts":[[2020,7,3]]},"reference":[{"key":"2023062408140635100_btaa614-B1","first-page":"87","author":"Bluis","year":"2003"},{"key":"2023062408140635100_btaa614-B2","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1007\/978-3-642-40453-5_13","volume-title":"Algorithms in Bioinformatics","author":"B\u00f6cker","year":"2013"},{"key":"2023062408140635100_btaa614-B3","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1109\/TCBB.2011.48","article-title":"Matching split distance for unrooted binary phylogenetic trees","volume":"9","author":"Bogdanowicz","year":"2012","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023062408140635100_btaa614-B4","doi-asserted-by":"crossref","first-page":"669","DOI":"10.2478\/amcs-2013-0050","article-title":"On a matching distance between rooted phylogenetic trees","volume":"23","author":"Bogdanowicz","year":"2013","journal-title":"Int. J. Appl. Math. Comput. Sci"},{"key":"2023062408140635100_btaa614-B5","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1089\/cmb.2016.0204","article-title":"Comparing phylogenetic trees by matching nodes using the transfer distance between partitions","volume":"24","author":"Bogdanowicz","year":"2017","journal-title":"J. Comput. Biol"},{"key":"2023062408140635100_btaa614-B6","doi-asserted-by":"crossref","first-page":"193","DOI":"10.2307\/2413326","article-title":"Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units","volume":"34","author":"Estabrook","year":"1985","journal-title":"Syst. Zool"},{"key":"2023062408140635100_btaa614-B7","doi-asserted-by":"crossref","first-page":"50","DOI":"10.2307\/2412378","article-title":"On comparing the shapes of taxonomic trees","volume":"22","author":"Farris","year":"1973","journal-title":"Syst. Zool"},{"key":"2023062408140635100_btaa614-B8","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1007\/BF01908078","article-title":"Obtaining common pruned trees","volume":"2","author":"Finden","year":"1985","journal-title":"J. Classif"},{"key":"2023062408140635100_btaa614-B9","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1111\/j.1096-0031.1993.tb00209.x","article-title":"Estimating character weights during tree search","volume":"9","author":"Goloboff","year":"1993","journal-title":"Cladistics"},{"key":"2023062408140635100_btaa614-B10","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1111\/j.1096-0031.1999.tb00278.x","article-title":"Analyzing large data sets in reasonable times: solutions for composite optima","volume":"15","author":"Goloboff","year":"1999","journal-title":"Cladistics"},{"key":"2023062408140635100_btaa614-B11","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1111\/cla.12160","article-title":"TNT version 1.5, including a full implementation of phylogenetic morphometrics","volume":"32","author":"Goloboff","year":"2016","journal-title":"Cladistics"},{"key":"2023062408140635100_btaa614-B12","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/0025-5564(90)90123-G","article-title":"Reconstructing evolution of sequences subject to recombination using parsimony","volume":"98","author":"Hein","year":"1990","journal-title":"Math. Biosci"},{"key":"2023062408140635100_btaa614-B13","first-page":"459","article-title":"A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates","volume":"11","author":"Kuhner","year":"1994","journal-title":"Mol. Biol. Evol"},{"key":"2023062408140635100_btaa614-B14","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1093\/sysbio\/syu085","article-title":"Practical performance of tree comparison metrics","volume":"64","author":"Kuhner","year":"2015","journal-title":"Syst. Biol"},{"key":"2023062408140635100_btaa614-B15","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1007\/3-540-61332-3_168","volume-title":"Computing and Combinatorics","author":"Li","year":"1996"},{"key":"2023062408140635100_btaa614-B16","doi-asserted-by":"crossref","first-page":"1014","DOI":"10.1109\/TCBB.2011.157","article-title":"A metric for phylogenetic trees based on matching","volume":"4","author":"Lin","year":"2012","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023062408140635100_btaa614-B17","author":"Maechler","year":"2019"},{"key":"2023062408140635100_btaa614-B18","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1146\/annurev.es.16.110185.002243","article-title":"Compatibility methods in systematics","volume":"16","author":"Meacham","year":"1985","journal-title":"Annu. Rev. Ecol. Syst"},{"key":"2023062408140635100_btaa614-B19","doi-asserted-by":"crossref","first-page":"873","DOI":"10.1016\/j.jmva.2006.11.013","article-title":"Comparing clusterings\u2014an information based distance","volume":"98","author":"Meil\u01ce","year":"2007","journal-title":"J. Multivar. Anal"},{"key":"2023062408140635100_btaa614-B20","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1111\/j.1096-0031.1999.tb00277.x","article-title":"The Parsimony Ratchet, a new method for rapid parsimony analysis","volume":"15","author":"Nixon","year":"1999","journal-title":"Cladistics"},{"key":"2023062408140635100_btaa614-B21","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1093\/bioinformatics\/bti720","article-title":"A novel algorithm and web-based tool for comparing two alternative phylogenetic trees","volume":"22","author":"Nye","year":"2006","journal-title":"Bioinformatics"},{"key":"2023062408140635100_btaa614-B22","doi-asserted-by":"crossref","first-page":"75","DOI":"10.2307\/2413347","article-title":"The use of tree comparison metrics","volume":"34","author":"Penny","year":"1985","journal-title":"Syst. Zool"},{"key":"2023062408140635100_btaa614-B23","doi-asserted-by":"crossref","first-page":"e20109","DOI":"10.1371\/journal.pone.0020109","article-title":"On the accuracy of language trees","volume":"6","author":"Pompei","year":"2011","journal-title":"PLoS One"},{"key":"2023062408140635100_btaa614-B24","year":"2019"},{"key":"2023062408140635100_btaa614-B25","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1007\/s10852-005-9022-1","article-title":"Clustering rules: a comparison of partitioning and hierarchical clustering algorithms","volume":"5","author":"Reynolds","year":"2006","journal-title":"J. Math. Model. Algor"},{"key":"2023062408140635100_btaa614-B26","doi-asserted-by":"crossref","DOI":"10.1101\/2020.06.08.141515","article-title":"Information content of trees: three-taxon statements inference rules and dependency","author":"Rineau","year":"2020"},{"key":"2023062408140635100_btaa614-B27","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/0025-5564(81)90043-2","article-title":"Comparison of phylogenetic trees","volume":"53","author":"Robinson","year":"1981","journal-title":"Math. Biosci"},{"key":"2023062408140635100_btaa614-B28","doi-asserted-by":"crossref","first-page":"2079","DOI":"10.1093\/bioinformatics\/btu157","article-title":"tqDist: a library for computing the quartet and triplet distances between binary or general trees","volume":"30","author":"Sand","year":"2014","journal-title":"Bioinformatics"},{"key":"2023062408140635100_btaa614-B29","doi-asserted-by":"crossref","first-page":"592","DOI":"10.1093\/bioinformatics\/btq706","article-title":"phangorn: phylogenetic analysis in R","volume":"27","author":"Schliep","year":"2011","journal-title":"Bioinformatics"},{"key":"2023062408140635100_btaa614-B30","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1002\/j.1538-7305.1948.tb00917.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J"},{"key":"2023062408140635100_btaa614-B31","doi-asserted-by":"crossref","first-page":"20180632","DOI":"10.1098\/rsbl.2018.0632","article-title":"Bayesian and parsimony approaches reconstruct informative trees from simulated morphological datasets","volume":"15","author":"Smith","year":"2019","journal-title":"Biol. Lett"},{"key":"2023062408140635100_btaa614-B32","article-title":"Quartet: comparison of phylogenetic trees using quartet and bipartition measures","author":"Smith","year":"2019","journal-title":"Comprehensive R Archive Network"},{"key":"2023062408140635100_btaa614-B33","article-title":"TBRDist: rearrangement distances between unrooted phylogenetic trees","author":"Smith","year":"2019","journal-title":"Comprehensive R Archive Network"},{"key":"2023062408140635100_btaa614-B34","article-title":"TreeDist: distances between phylogenetic trees","author":"Smith","year":"2020","journal-title":"Comprehensive R Archive Network"},{"key":"2023062408140635100_btaa614-B35","article-title":"TreeDistData: analysis of phylogenetic tree distance measures","author":"Smith","year":"2020","journal-title":"Zenodo"},{"key":"2023062408140635100_btaa614-B36","first-page":"126","article-title":"Distributions of tree comparison metrics\u2014some new results","volume":"42","author":"Steel","year":"1993","journal-title":"Syst. Biol"},{"key":"2023062408140635100_btaa614-B37","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1093\/acprof:oso\/9780199297306.003.0009","volume-title":"Parsimony, Phylogeny, and Genomics","author":"Steel","year":"2006"},{"key":"2023062408140635100_btaa614-B38","doi-asserted-by":"crossref","first-page":"S285","DOI":"10.1093\/bioinformatics\/18.suppl_1.S285","article-title":"Statistically based postprocessing of phylogenetic analysis by clustering","volume":"18","author":"Stockham","year":"2002","journal-title":"Bioinformatics"},{"key":"2023062408140635100_btaa614-B066675","doi-asserted-by":"publisher","author":"et","year":"1998","DOI":"10.1007\/978-3-642-72253-0_12"},{"key":"2023062408140635100_btaa614-B39","first-page":"2837","article-title":"Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance","volume":"11","author":"Vinh","year":"2010","journal-title":"J. Mach. Learn. Res"},{"key":"2023062408140635100_btaa614-B40","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1007\/s11222-007-9033-z","article-title":"A tutorial on spectral clustering","volume":"17","author":"von Luxburg","year":"2007","journal-title":"Stat. Comput"},{"key":"2023062408140635100_btaa614-B41","article-title":"Calculating the unrooted subtree-prune-and-regraft distance","author":"Whidden","year":"2017"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa614\/33866911\/btaa614.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/20\/5007\/50692951\/btaa614.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/20\/5007\/50692951\/btaa614.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T23:38:48Z","timestamp":1687649928000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/20\/5007\/5866976"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,7,3]]},"references-count":42,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2020,12,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa614","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,10,15]]},"published":{"date-parts":[[2020,7,3]]}}}