{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T21:30:11Z","timestamp":1768685411839,"version":"3.49.0"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2022,1,3]],"date-time":"2022-01-03T00:00:00Z","timestamp":1641168000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"National Science Foundation","award":["IIS-1845967"],"award-info":[{"award-number":["IIS-1845967"]}]},{"name":"San Diego Supercomputer Center (SDSC) through XSEDE allocations"},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["ACI-1053575"],"award-info":[{"award-number":["ACI-1053575"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>As genome-wide reconstruction of phylogenetic trees becomes more widespread, limitations of available data are being appreciated more than ever before. One issue is that phylogenomic datasets are riddled with missing data, and gene trees, in particular, almost always lack representatives from some species otherwise available in the dataset. Since many downstream applications of gene trees require or can benefit from access to complete gene trees, it will be beneficial to algorithmically complete gene trees. Also, gene trees are often unrooted, and rooting them is useful for downstream applications. While completing and rooting a gene tree with respect to a given species tree has been studied, those problems are not studied in depth when we lack such a reference species tree.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We study completion of gene trees without a need for a reference species tree. We formulate an optimization problem to complete the gene trees while minimizing their quartet distance to the given set of gene trees. We extend a seminal algorithm by Brodal et al. to solve this problem in quasi-linear time. In simulated studies and on a large empirical data, we show that completion of gene trees using other gene trees is relatively accurate and, unlike the case where a species tree is available, is unbiased.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Our method, tripVote, is available at https:\/\/github.com\/uym2\/tripVote.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab875","type":"journal-article","created":{"date-parts":[[2021,12,30]],"date-time":"2021-12-30T20:14:28Z","timestamp":1640895268000},"page":"1532-1541","source":"Crossref","is-referenced-by-count":15,"title":["Completing gene trees without species trees in sub-quadratic time"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5065-2814","authenticated-orcid":false,"given":"Uyen","family":"Mai","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, University of California San Diego , San Diego, CA 92093, USA"}]},{"given":"Siavash","family":"Mirarab","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of California San Diego , San Diego, CA 92093, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,1,3]]},"reference":[{"key":"2023020201174538600_btab875-B1","author":"Aiemvaravutigul","year":"2019"},{"key":"2023020201174538600_btab875-B2","first-page":"209","author":"Bansal","year":"2018"},{"key":"2023020201174538600_btab875-B3","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1089\/cmb.2012.0037","article-title":"Estimating optimal species trees from incomplete gene trees under deep coalescence","volume":"19","author":"Bayzid","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023020201174538600_btab875-B4","first-page":"1814","author":"Brodal","year":"2013"},{"key":"2023020201174538600_btab875-B5","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1186\/s13015-018-0124-5","article-title":"Octal: optimal completion of gene trees in polynomial time","volume":"13","author":"Christensen","year":"2018","journal-title":"Algorithms Mol. Biol"},{"key":"2023020201174538600_btab875-B6","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1093\/sysbio\/syv041","article-title":"Can we identify genes with increased phylogenetic reliability?","volume":"64","author":"Doyle","year":"2015","journal-title":"Syst. Biol"},{"key":"2023020201174538600_btab875-B7","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/S0304-3975(99)00028-6","article-title":"A few logs suffice to build (almost) all trees: part II","volume":"221","author":"Erdos","year":"1999","journal-title":"Theor. Comput. Sci"},{"key":"2023020201174538600_btab875-B8","doi-asserted-by":"crossref","first-page":"1110","DOI":"10.1093\/molbev\/msv347","article-title":"Avoiding missing data biases in phylogenomic inference: an empirical study in the Landfowl (Aves: Galliformes)","volume":"33","author":"Hosner","year":"2016","journal-title":"Mol. Biol. Evol"},{"key":"2023020201174538600_btab875-B9","doi-asserted-by":"crossref","first-page":"1057","DOI":"10.1016\/j.ympev.2013.06.004","article-title":"Effects of missing data on species tree estimation under the coalescent","volume":"69","author":"Hovm\u00f6ller","year":"2013","journal-title":"Mol. Phylogenet. Evol"},{"key":"2023020201174538600_btab875-B10","author":"Jiang","year":"2021"},{"key":"2023020201174538600_btab875-B11","author":"Johansen","year":"2013"},{"key":"2023020201174538600_btab875-B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.tcs.2018.10.005","article-title":"On the weighted quartet consensus problem","volume":"769","author":"Lafond","year":"2019","journal-title":"Theor. Comput. Sci"},{"key":"2023020201174538600_btab875-B13","doi-asserted-by":"crossref","first-page":"R593","DOI":"10.1016\/j.cub.2012.06.013","article-title":"Origin of land plants revisited in the light of sequence contamination and missing data","volume":"22","author":"Laurin-Lemay","year":"2012","journal-title":"Curr. Biol"},{"key":"2023020201174538600_btab875-B14","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1186\/s12864-018-4620-2","article-title":"TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees","volume":"19","author":"Mai","year":"2018","journal-title":"BMC Genomics"},{"key":"2023020201174538600_btab875-B15","doi-asserted-by":"crossref","first-page":"e0182238","DOI":"10.1371\/journal.pone.0182238","article-title":"Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction","volume":"12","author":"Mai","year":"2017","journal-title":"PLoS One"},{"key":"2023020201174538600_btab875-B16","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1093\/sysbio\/syy064","article-title":"Impacts of inference method and data set filtering on phylogenomic resolution in a rapid radiation of ground squirrels (Xerinae: Marmotini)","volume":"68","author":"Mclean","year":"2019","journal-title":"Syst. Biol"},{"key":"2023020201174538600_btab875-B17","author":"Mirarab","year":"2015"},{"key":"2023020201174538600_btab875-B18","doi-asserted-by":"crossref","first-page":"i44","DOI":"10.1093\/bioinformatics\/btv234","article-title":"ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes","volume":"31","author":"Mirarab","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020201174538600_btab875-B19","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1093\/sysbio\/syx077","article-title":"To include or not to include: the impact of gene filtering on species tree estimation methods","volume":"67","author":"Molloy","year":"2018","journal-title":"System. Biol"},{"key":"2023020201174538600_btab875-B20","doi-asserted-by":"crossref","first-page":"i274","DOI":"10.1093\/bioinformatics\/bts218","article-title":"DACTAL: divide-and-conquer trees (almost) without alignments","volume":"28","author":"Nelesen","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020201174538600_btab875-B21","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1186\/s12864-018-4619-8","article-title":"The performance of coalescent-based species tree estimation methods under models of missing data","volume":"19","author":"Nute","year":"2018","journal-title":"BMC Genomics"},{"key":"2023020201174538600_btab875-B22","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1038\/s41586-019-1693-2","article-title":"One thousand plant transcriptomes and the phylogenomics of green plants","volume":"574","author":"OneKP Initiative","year":"2019","journal-title":"Nature"},{"key":"2023020201174538600_btab875-B23","first-page":"1","article-title":"Pitfalls in supermatrix phylogenomics","volume":"280","author":"Philippe","year":"2017","journal-title":"Eur. J. Taxonomy"},{"key":"2023020201174538600_btab875-B24","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1186\/s12864-020-6607-z","article-title":"Forcing external constraints on tree inference using ASTRAL","volume":"21","author":"Rabiee","year":"2020","journal-title":"BMC Genomics"},{"key":"2023020201174538600_btab875-B25","doi-asserted-by":"crossref","first-page":"384","DOI":"10.1093\/sysbio\/syz045","article-title":"INSTRAL: discordance-aware phylogenetic placement using quartet scores","volume":"69","author":"Rabiee","year":"2020","journal-title":"Syst. Biol"},{"key":"2023020201174538600_btab875-B26","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/0025-5564(81)90043-2","article-title":"Comparison of phylogenetic trees","volume":"53","author":"Robinson","year":"1981","journal-title":"Math. Biosci"},{"key":"2023020201174538600_btab875-B27","doi-asserted-by":"crossref","first-page":"2079","DOI":"10.1093\/bioinformatics\/btu157","article-title":"tqDist: a library for computing the quartet and triplet distances between binary or general trees","volume":"30","author":"Sand","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020201174538600_btab875-B28","doi-asserted-by":"crossref","first-page":"3279","DOI":"10.1093\/molbev\/msx261","article-title":"Fragmentary gene sequences negatively impact gene tree and species tree reconstruction","volume":"34","author":"Sayyari","year":"2017","journal-title":"Mol. Biol. Evol"},{"key":"2023020201174538600_btab875-B29","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1089\/cmb.2007.0103","article-title":"Short quartet puzzling: a new quartet-based phylogeny reconstruction algorithm","volume":"15","author":"Snir","year":"2008","journal-title":"J. Comput. Biol"},{"key":"2023020201174538600_btab875-B30","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1080\/14772000.2017.1401016","article-title":"On the importance of homology in the age of phylogenomics","volume":"16","author":"Springer","year":"2018","journal-title":"Syst. Biodiversity"},{"key":"2023020201174538600_btab875-B31","doi-asserted-by":"crossref","first-page":"778","DOI":"10.1093\/sysbio\/syv033","article-title":"Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference","volume":"64","author":"Tan","year":"2015","journal-title":"Syst. Biol"},{"key":"2023020201174538600_btab875-B32","first-page":"186","author":"Warnow","year":"2001"},{"key":"2023020201174538600_btab875-B33","doi-asserted-by":"crossref","first-page":"838","DOI":"10.1093\/molbev\/msv266","article-title":"The impact of missing data on species tree estimation","volume":"33","author":"Xi","year":"2016","journal-title":"Mol. Biol. Evol"},{"key":"2023020201174538600_btab875-B34","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1186\/s12859-018-2129-y","article-title":"ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees","volume":"19","author":"Zhang","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023020201174538600_btab875-B35","author":"Zhang","year":"2020"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab875\/42229354\/btab875.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/6\/1532\/49009868\/btab875.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/6\/1532\/49009868\/btab875.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T03:38:34Z","timestamp":1675309114000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/6\/1532\/6493250"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,1,3]]},"references-count":35,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab875","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,3,15]]},"published":{"date-parts":[[2022,1,3]]}}}