{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T13:53:03Z","timestamp":1781272383215,"version":"3.54.1"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,12,1]],"date-time":"2022-12-01T00:00:00Z","timestamp":1669852800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union\u2019s Horizon 2020","award":["874735"],"award-info":[{"award-number":["874735"]}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Foundation","doi-asserted-by":"publisher","award":["NNF16OC0021856"],"award-info":[{"award-number":["NNF16OC0021856"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Global Surveillance of Antimicrobial Resistance"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The neighbor-joining (NJ) algorithm is a widely used method to perform iterative clustering and forms the basis for phylogenetic reconstruction in several bioinformatic pipelines. Although NJ is considered to be a computationally efficient algorithm, it does not scale well for datasets exceeding several thousand taxa (&amp;gt;100 000). Optimizations to the canonical NJ algorithm have been proposed; these optimizations are, however, achieved through approximations or extensive memory usage, which is not feasible for large datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this article, two new algorithms, dynamic neighbor joining (DNJ) and heuristic neighbor joining (HNJ), are presented, which optimize the canonical NJ method to scale to millions of taxa without increasing the memory requirements. Both DNJ and HNJ outperform the current gold standard methods to construct NJ trees, while DNJ is guaranteed to produce exact NJ trees.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/bitbucket.org\/genomicepidemiology\/ccphylo.git<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac774","type":"journal-article","created":{"date-parts":[[2022,12,1]],"date-time":"2022-12-01T17:50:30Z","timestamp":1669917030000},"source":"Crossref","is-referenced-by-count":12,"title":["Scaling neighbor joining to one million taxa with dynamic and heuristic neighbor joining"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8197-7520","authenticated-orcid":false,"given":"Philip T L C","family":"Clausen","sequence":"first","affiliation":[{"name":"Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark , 2800 Kgs. Lyngby, Denmark"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2022,12,1]]},"reference":[{"key":"2023010107540475500_btac774-B1","first-page":"160","volume-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","author":"Campello","year":"2013"},{"key":"2023010107540475500_btac774-B2","doi-asserted-by":"crossref","first-page":"736","DOI":"10.1016\/j.cell.2017.04.016","article-title":"An immune atlas of clear cell renal cell carcinoma","volume":"169","author":"Chevrier","year":"2017","journal-title":"Cell"},{"key":"2023010107540475500_btac774-B3","doi-asserted-by":"crossref","DOI":"10.1186\/s12859-018-2336-6","article-title":"Rapid and precise alignment of raw reads against redundant databases with KMA","volume":"19","author":"Clausen","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023010107540475500_btac774-B4","doi-asserted-by":"crossref","first-page":"6457","DOI":"10.1038\/s41598-022-10097-z","article-title":"Rapid evolution of SARS-CoV-2 challenges human defenses","volume":"12","author":"Duarte","year":"2022","journal-title":"Sci. Rep"},{"key":"2023010107540475500_btac774-B5","doi-asserted-by":"crossref","first-page":"1993","DOI":"10.1016\/j.tcs.2008.12.040","article-title":"Fast neighbor joining","volume":"410","author":"Elias","year":"2009","journal-title":"Theor. Comput. Sci"},{"key":"2023010107540475500_btac774-B6","first-page":"226","author":"Ester","year":"1996"},{"key":"2023010107540475500_btac774-B7","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1007\/s00239-005-0176-2","article-title":"Relaxed neighbor joining: a fast distance-based phylogenetic tree construction method","volume":"62","author":"Evans","year":"2006","journal-title":"J. Mol. Evol"},{"key":"2023010107540475500_btac774-B8","first-page":"768","article-title":"Cluster analysis of multivariate data: efficiency versus interpretability of classifications","volume":"21","author":"Forgy","year":"1965","journal-title":"Biometrics"},{"key":"2023010107540475500_btac774-B9","doi-asserted-by":"crossref","DOI":"10.1093\/biomethods\/bpab008","article-title":"MINTyper: an outbreak-detection method for accurate and rapid SNP typing of clonal clusters with noisy long reads","volume":"6","author":"Hallgren","year":"2021","journal-title":"Biol. Methods Protoc"},{"key":"2023010107540475500_btac774-B10","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1002\/pro.5560010313","article-title":"Selection of representative protein data sets","volume":"1","author":"Hobohm","year":"1992","journal-title":"Protein Sci"},{"key":"2023010107540475500_btac774-B11","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1093\/bioinformatics\/btl592","article-title":"PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences","volume":"23","author":"Katoh","year":"2007","journal-title":"Bioinformatics"},{"key":"2023010107540475500_btac774-B12","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1186\/1471-2105-14-334","article-title":"Fastphylo: fast tools for phylogenetics","volume":"14","author":"Khan","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023010107540475500_btac774-B13","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2023010107540475500_btac774-B14","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1186\/s12859-015-0508-1","article-title":"Classification of bioinformatics workflows using weighted versions of partitioning and hierarchical clustering algorithms","volume":"16","author":"Lord","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023010107540475500_btac774-B15","doi-asserted-by":"crossref","first-page":"2461","DOI":"10.1093\/molbev\/msaa131","article-title":"Corrigendum to: IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era","volume":"37","author":"Minh","year":"2020","journal-title":"Mol. Biol. Evol"},{"key":"2023010107540475500_btac774-B16","first-page":"298","author":"Nagpal","year":"2013"},{"key":"2023010107540475500_btac774-B17","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2023010107540475500_btac774-B18","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1126\/science.abf2946","article-title":"Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK","volume":"371","author":"Du Plessis","year":"2021","journal-title":"Science"},{"key":"2023010107540475500_btac774-B19","doi-asserted-by":"crossref","first-page":"e9490","DOI":"10.1371\/journal.pone.0009490","article-title":"FastTree 2\u2013approximately maximum-likelihood trees for large alignments","volume":"5","author":"Price","year":"2010","journal-title":"PLoS One"},{"key":"2023010107540475500_btac774-B20","first-page":"406","article-title":"The neighbor-joining method: a new method for reconstructing phylogenetic trees","volume":"4","author":"Saitou","year":"1987","journal-title":"Mol. Biol. Evol"},{"key":"2023010107540475500_btac774-B21","doi-asserted-by":"crossref","first-page":"D23","DOI":"10.1093\/nar\/gky1069","article-title":"Database resources of the national center for biotechnology information","volume":"47","author":"Sayers","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023010107540475500_btac774-B22","doi-asserted-by":"crossref","first-page":"2823","DOI":"10.1093\/bioinformatics\/btl478","article-title":"Clearcut: a fast implementation of relaxed neighbor joining","volume":"22","author":"Sheneman","year":"2006","journal-title":"Bioinformatics"},{"key":"2023010107540475500_btac774-B23","first-page":"707","author":"Shirkhorshidi","year":"2014"},{"key":"2023010107540475500_btac774-B24","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1093\/comjnl\/16.1.30","article-title":"SLINK: an optimally efficient algorithm for the single-link cluster method","volume":"16","author":"Sibson","year":"1973","journal-title":"Comput. J"},{"key":"2023010107540475500_btac774-B25","first-page":"113","author":"Simonsen","year":"2008"},{"key":"2023010107540475500_btac774-B26","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1038\/s42003-020-0869-5","article-title":"Large scale automated phylogenomic analysis of bacterial isolates and the evergreen online platform","volume":"3","author":"Szarvas","year":"2020","journal-title":"Commun. Biol"},{"key":"2023010107540475500_btac774-B27","article-title":"GenomeTrakr proficiency testing for foodborne pathogen surveillance: an exercise from 2015","volume":"4, e000185","author":"Timme","year":"2018","journal-title":"Microb. Genomics"},{"key":"2023010107540475500_btac774-B28","first-page":"375","author":"Wheeler","year":"2009"},{"key":"2023010107540475500_btac774-B29","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1038\/s41586-020-2008-3","article-title":"A new coronavirus associated with human respiratory disease in China","volume":"579","author":"Wu","year":"2020","journal-title":"Nature"},{"key":"2023010107540475500_btac774-B30","doi-asserted-by":"crossref","first-page":"2640","DOI":"10.1093\/jac\/dks261","article-title":"Identification of acquired antimicrobial resistance genes","volume":"67","author":"Zankari","year":"2012","journal-title":"J. Antimicrob. Chemother"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac774\/47774019\/btac774.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac774\/48448937\/btac774.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac774\/48448937\/btac774.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T10:12:18Z","timestamp":1672567938000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btac774\/6858462"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2022,12,1]]},"references-count":30,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12,1]]},"published-print":{"date-parts":[[2023,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac774","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,1,1]]},"published":{"date-parts":[[2022,12,1]]},"article-number":"btac774"}}