{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T11:09:40Z","timestamp":1770894580327,"version":"3.50.1"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2024,11,21]],"date-time":"2024-11-21T00:00:00Z","timestamp":1732147200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004359","name":"Swedish Research Council","doi-asserted-by":"publisher","award":["2022-03516_VR"],"award-info":[{"award-number":["2022-03516_VR"]}],"id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004359","name":"Swedish Research Council","doi-asserted-by":"publisher","award":["2018-06217_VR"],"award-info":[{"award-number":["2018-06217_VR"]}],"id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Phylogenetic reconstruction is a fundamental problem in computational biology. The Neighbor Joining (NJ) algorithm offers an efficient distance-based solution to this problem, which often serves as the foundation for more advanced statistical methods. Despite prior efforts to enhance the speed of NJ, the computation of the n2 entries of the distance matrix, where n is the number of phylogenetic tree leaves, continues to pose a limitation in scaling NJ to larger datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this work, we propose a new algorithm which does not require computing a dense distance matrix. Instead, it dynamically determines a sparse set of at most O(n\u2009log\u2009n) distance matrix entries to be computed in its basic version, and up to O(n\u2009log\u20092n) entries in an enhanced version. We show by experiments that this approach reduces the execution time of NJ for large datasets, with a trade-off in accuracy.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Sparse Neighbor Joining is implemented in Python and freely available at https:\/\/github.com\/kurtsemih\/SNJ.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae701","type":"journal-article","created":{"date-parts":[[2024,11,21]],"date-time":"2024-11-21T16:29:02Z","timestamp":1732206542000},"source":"Crossref","is-referenced-by-count":2,"title":["Sparse Neighbor Joining: rapid phylogenetic inference using a sparse distance matrix"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9760-9140","authenticated-orcid":false,"given":"Semih","family":"Kurt","sequence":"first","affiliation":[{"name":"School of EECS and SciLifeLab, KTH Royal Institute of Technology , Stockholm, 100 44,","place":["Sweden"]}]},{"given":"Alexandre","family":"Bouchard-C\u00f4t\u00e9","sequence":"additional","affiliation":[{"name":"Department of Statistics, University of British Columbia , Vancouver, BC, V6T 1Z4,","place":["Canada"]}]},{"given":"Jens","family":"Lagergren","sequence":"additional","affiliation":[{"name":"School of EECS and SciLifeLab, KTH Royal Institute of Technology , Stockholm, 100 44,","place":["Sweden"]}]}],"member":"286","published-online":{"date-parts":[[2024,11,21]]},"reference":[{"key":"2024121301364051500_btae701-B1","first-page":"2","author":"Abecasis","year":"2007"},{"key":"2024121301364051500_btae701-B2","doi-asserted-by":"publisher","author":"Arvestad","DOI":"10.1101\/2023.10.11.561902"},{"key":"2024121301364051500_btae701-B3","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1007\/PL00008277","article-title":"The performance of neighbor-joining methods of phylogenetic reconstruction","volume":"25","author":"Atteson","year":"1999","journal-title":"Algorithmica"},{"key":"2024121301364051500_btae701-B4","doi-asserted-by":"crossref","first-page":"2529","DOI":"10.1007\/s11538-013-9906-6","article-title":"A note on probabilistic models over strings: the linear algebra approach","volume":"75","author":"Bouchard-C\u00f4t\u00e9","year":"2013","journal-title":"Bull Math Biol"},{"key":"2024121301364051500_btae701-B5","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/1471-2105-3-2","article-title":"The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs","volume":"3","author":"Cannone","year":"2002","journal-title":"BMC Bioinformatics"},{"key":"2024121301364051500_btae701-B6","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1007\/s00357-006-0006-2","article-title":"Maximum transfer distance between partitions","volume":"23","author":"Charon","year":"2006","journal-title":"J Classif"},{"key":"2024121301364051500_btae701-B7","doi-asserted-by":"crossref","first-page":"btac774","DOI":"10.1093\/bioinformatics\/btac774","article-title":"Scaling neighbor joining to one million taxa with dynamic and heuristic neighbor joining","volume":"39","author":"Clausen","year":"2022","journal-title":"Bioinformatics"},{"key":"2024121301364051500_btae701-B8","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1186\/s12859-018-2336-6","article-title":"Rapid and precise alignment of raw reads against redundant databases with KMA","volume":"19","author":"Clausen","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2024121301364051500_btae701-B9","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1016\/0165-4896(81)90042-1","article-title":"The complexity of computing metric distances between partitions","volume":"1","author":"Day","year":"1981","journal-title":"Math Soc Sci"},{"key":"2024121301364051500_btae701-B10","doi-asserted-by":"crossref","first-page":"746","DOI":"10.1038\/s41588-023-01368-0","article-title":"Maximum likelihood pandemic-scale phylogenetics","volume":"55","author":"De Maio","year":"2023","journal-title":"Nat Genet"},{"key":"2024121301364051500_btae701-B11","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2024121301364051500_btae701-B12","doi-asserted-by":"crossref","first-page":"1993","DOI":"10.1016\/j.tcs.2008.12.040","article-title":"Fast neighbor joining","volume":"410","author":"Elias","year":"2009","journal-title":"Theor Comput Sci"},{"key":"2024121301364051500_btae701-B13","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1007\/s00239-005-0176-2","article-title":"Relaxed neighbor joining: a fast distance-based phylogenetic tree construction method","volume":"62","author":"Evans","year":"2006","journal-title":"J Mol Evol"},{"key":"2024121301364051500_btae701-B14","volume-title":"Inferring Phylogenies","author":"Felsenstein","year":"2004"},{"key":"2024121301364051500_btae701-B15","doi-asserted-by":"crossref","first-page":"490","DOI":"10.1093\/sysbio\/syx090","article-title":"Effective online bayesian phylogenetics via sequential Monte Carlo with guided proposals","volume":"67","author":"Fourment","year":"2018","journal-title":"Syst Biol"},{"key":"2024121301364051500_btae701-B16","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1093\/oxfordjournals.molbev.a025808","article-title":"BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data","volume":"14","author":"Gascuel","year":"1997","journal-title":"Mol Biol Evol"},{"key":"2024121301364051500_btae701-B17","doi-asserted-by":"crossref","first-page":"4121","DOI":"10.1093\/bioinformatics\/bty407","article-title":"Nextstrain: real-time tracking of pathogen evolution","volume":"34","author":"Hadfield","year":"2018","journal-title":"Bioinformatics"},{"key":"2024121301364051500_btae701-B18","doi-asserted-by":"crossref","first-page":"597","DOI":"10.1016\/S0092-8240(89)80102-8","article-title":"An optimal algorithm to reconstruct trees from additive distance data","volume":"51","author":"Hein","year":"1989","journal-title":"Bull Math Biol"},{"key":"2024121301364051500_btae701-B19","first-page":"185","article-title":"Sur les assemblages de lignes","volume":"70","author":"Jordan","year":"1869","journal-title":"J f\u00fcr die Reine und Angew Math"},{"key":"2024121301364051500_btae701-B20","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1006\/jagm.1996.0035","article-title":"Determining the evolutionary tree using experiments","volume":"21","author":"Kannan","year":"1996","journal-title":"J Algorithms"},{"key":"2024121301364051500_btae701-B21","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1007\/3-540-49116-3_17","volume-title":"STACS 99","author":"Kao","year":"1999"},{"key":"2024121301364051500_btae701-B22","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1186\/1471-2105-14-334","article-title":"Fastphylo: fast tools for phylogenetics","volume":"14","author":"Khan","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2024121301364051500_btae701-B23","author":"King","year":"2003"},{"key":"2024121301364051500_btae701-B24","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1038\/s41586-018-0043-0","article-title":"Renewing Felsenstein\u2019s phylogenetic bootstrap in the era of big data","volume":"556","author":"Lemoine","year":"2018","journal-title":"Nature"},{"key":"2024121301364051500_btae701-B25","doi-asserted-by":"crossref","first-page":"1014","DOI":"10.1109\/TCBB.2011.157","article-title":"A metric for phylogenetic trees based on matching","volume":"9","author":"Lin","year":"2012","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2024121301364051500_btae701-B26","doi-asserted-by":"crossref","first-page":"1530","DOI":"10.1093\/molbev\/msaa015","article-title":"IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era","volume":"37","author":"Minh","year":"2020","journal-title":"Mol Biol Evol"},{"key":"2024121301364051500_btae701-B27","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1186\/s13015-019-0151-x","article-title":"Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge","volume":"14","author":"Molloy","year":"2019","journal-title":"Algorithms Mol Biol"},{"key":"2024121301364051500_btae701-B28","doi-asserted-by":"crossref","first-page":"148","DOI":"10.3390\/a14050148","article-title":"Disjoint tree mergers for large-scale maximum likelihood tree estimation","volume":"14","author":"Park","year":"2021","journal-title":"Algorithms"},{"key":"2024121301364051500_btae701-B29","doi-asserted-by":"crossref","first-page":"e9490","DOI":"10.1371\/journal.pone.0009490","article-title":"FastTree 2\u2014approximately maximum-likelihood trees for large alignments","volume":"5","author":"Price","year":"2010","journal-title":"PLoS One"},{"key":"2024121301364051500_btae701-B30","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/0025-5564(81)90043-2","article-title":"Comparison of phylogenetic trees","volume":"53","author":"Robinson","year":"1981","journal-title":"Math Biosci"},{"key":"2024121301364051500_btae701-B31","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-62703-646-7","volume-title":"Multiple Sequence Alignment Methods","author":"Russel","year":"2014"},{"key":"2024121301364051500_btae701-B32","doi-asserted-by":"crossref","first-page":"2079","DOI":"10.1093\/bioinformatics\/btu157","article-title":"tqDist: a library for computing the quartet and triplet distances between binary or general trees","volume":"30","author":"Sand","year":"2014","journal-title":"Bioinformatics"},{"key":"2024121301364051500_btae701-B33","doi-asserted-by":"crossref","first-page":"D23","DOI":"10.1093\/nar\/gky1069","article-title":"Database resources of the national center for biotechnology information","volume":"47","author":"Sayers","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024121301364051500_btae701-B34","author":"scikit-bio development team","year":"2020"},{"key":"2024121301364051500_btae701-B35","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1002\/pro.3290","article-title":"Clustal omega for making accurate alignments of many protein sequences","volume":"27","author":"Sievers","year":"2017","journal-title":"Protein Sci"},{"key":"2024121301364051500_btae701-B36","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1007\/978-3-540-87361-7_10","volume-title":"Algorithms in Bioinformatics.","author":"Simonsen","year":"2008"},{"key":"2024121301364051500_btae701-B37","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1186\/s12864-020-6605-1","article-title":"Unblended disjoint tree merging using GTM improves species tree estimation","volume":"21","author":"Smirnov","year":"2020","journal-title":"BMC Genomics"},{"key":"2024121301364051500_btae701-B38","first-page":"126","article-title":"Distributions of tree comparison metrics\u2014some new results","volume":"42","author":"Steel","year":"1993","journal-title":"Syst Biol"},{"key":"2024121301364051500_btae701-B39","volume-title":"Asymptotic Statistics","author":"van der Vaart","year":"2000"},{"key":"2024121301364051500_btae701-B40","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1007\/978-3-642-04241-6_31","volume-title":"Algorithms in Bioinformatics","author":"Wheeler","year":"2009"},{"key":"2024121301364051500_btae701-B41","doi-asserted-by":"crossref","first-page":"20210244","DOI":"10.1098\/rstb.2021.0244","article-title":"Recent progress on methods for estimating and updating large phylogenies","volume":"377","author":"Zaharias","year":"2022","journal-title":"Philos Trans R Soc Lond B Biol Sci"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae701\/60780407\/btae701.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/12\/btae701\/61099632\/btae701.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/12\/btae701\/61099632\/btae701.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,13]],"date-time":"2024-12-13T01:36:58Z","timestamp":1734053818000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae701\/7906488"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,11,21]]},"references-count":41,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,11,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae701","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,12]]},"published":{"date-parts":[[2024,11,21]]},"article-number":"btae701"}}