{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T22:21:05Z","timestamp":1776291665084,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T00:00:00Z","timestamp":1655856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Centers for Disease Control and Prevention BAA","award":["200-2021-11554"],"award-info":[{"award-number":["200-2021-11554"]}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["T32HG008345"],"award-info":[{"award-number":["T32HG008345"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["F31HG010584"],"award-info":[{"award-number":["F31HG010584"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Australian National University Futures"},{"name":"Australian Research Council Discovery","award":["DP200103151"],"award-info":[{"award-number":["DP200103151"]}]},{"name":"Chan-Zuckerberg Initiative Grant for Essential Open Source Software for Science"},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R35GM128932"],"award-info":[{"award-number":["R35GM128932"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Eric and Wendy Schmidt Foundation"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Phylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the coronavirus disease 2019 (COVID-19) pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods. We have developed this method particularly to address the pressing need during the COVID-19 pandemic for daily maintenance and optimization of a comprehensive SARS-CoV-2 phylogeny. matOptimize is currently helping refine on a daily basis possibly the largest-ever phylogenetic tree, containing millions of SARS-CoV-2 sequences.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The matOptimize code is freely available as part of the UShER package (https:\/\/github.com\/yatisht\/usher) and can also be installed via bioconda (https:\/\/bioconda.github.io\/recipes\/usher\/README.html). All scripts we used to perform the experiments in this manuscript are available at https:\/\/github.com\/yceh\/matOptimize-experiments.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac401","type":"journal-article","created":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T10:45:41Z","timestamp":1655894741000},"page":"3734-3740","source":"Crossref","is-referenced-by-count":34,"title":["matOptimize: a parallel tree optimization method enables online phylogenetics for SARS-CoV-2"],"prefix":"10.1093","volume":"38","author":[{"given":"Cheng","family":"Ye","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of California, San Diego , San Diego, CA 92093, USA"}]},{"given":"Bryan","family":"Thornlow","sequence":"additional","affiliation":[{"name":"Department of Biomolecular Engineering, University of California, Santa Cruz , Santa Cruz, CA 95064, USA"},{"name":"Genomics Institute, University of California, Santa Cruz , Santa Cruz, CA 95064, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1697-1130","authenticated-orcid":false,"given":"Angie","family":"Hinrichs","sequence":"additional","affiliation":[{"name":"Genomics Institute, University of California, Santa Cruz , Santa Cruz, CA 95064, USA"}]},{"given":"Alexander","family":"Kramer","sequence":"additional","affiliation":[{"name":"Department of Biomolecular Engineering, University of California, Santa Cruz , Santa Cruz, CA 95064, USA"},{"name":"Genomics Institute, University of California, Santa Cruz , Santa Cruz, CA 95064, USA"}]},{"given":"Cade","family":"Mirchandani","sequence":"additional","affiliation":[{"name":"Department of Biomolecular Engineering, University of California, Santa Cruz , Santa Cruz, CA 95064, USA"},{"name":"Genomics Institute, University of California, Santa Cruz , Santa Cruz, CA 95064, USA"}]},{"given":"Devika","family":"Torvi","sequence":"additional","affiliation":[{"name":"Department of Bioengineering, University of California, San Diego , San Diego, CA 92093, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1140-2596","authenticated-orcid":false,"given":"Robert","family":"Lanfear","sequence":"additional","affiliation":[{"name":"Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra , ACT 2601, Australia"}]},{"given":"Russell","family":"Corbett-Detig","sequence":"additional","affiliation":[{"name":"Department of Biomolecular Engineering, University of California, Santa Cruz , Santa Cruz, CA 95064, USA"},{"name":"Genomics Institute, University of California, Santa Cruz , Santa Cruz, CA 95064, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5600-2900","authenticated-orcid":false,"given":"Yatish","family":"Turakhia","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of California, San Diego , San Diego, CA 92093, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,6,22]]},"reference":[{"key":"2023041405351887000_","doi-asserted-by":"crossref","DOI":"10.1101\/2021.09.20.21263869","volume-title":"Genomic surveillance in Japan of AY.29\u2014a new sub-lineage of SARS-CoV-2 delta variant with C5239T and T5514C mutations","author":"Abe","year":"2021"},{"key":"2023041405351887000_","first-page":"1735","author":"Chen","year":"2021"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"D67","DOI":"10.1093\/nar\/gkv1276","article-title":"GenBank","volume":"44","author":"Clark","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023041405351887000_","volume-title":"Introduction to Algorithms","author":"Cormen","year":"2009","edition":"3rd edn"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1038\/s41564-020-00838-z","article-title":"Genomic epidemiology reveals multiple introductions of SARS-CoV-2 from mainland Europe into Scotland","volume":"6","author":"da Silva Filipe","year":"2021","journal-title":"Nat. Microbiol"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"582","DOI":"10.1126\/science.abb9263","article-title":"Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California","volume":"369","author":"Deng","year":"2020","journal-title":"Science"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"5769","DOI":"10.1038\/s41467-021-26055-8","article-title":"Emergence and spread of SARS-CoV-2 lineage B.1.620 with variant of concern-like mutations and deletions","volume":"12","author":"Dudas","year":"2021","journal-title":"Nat. Commun"},{"key":"2023041405351887000_","volume-title":"PHYLIP (Phylogeny Inference Package) Department of Genome Sciences","author":"Felsenstein","year":"2005"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1093\/sysbio\/20.4.406","article-title":"Toward defining the course of evolution: minimum change for a specific tree topology","volume":"20","author":"Fitch","year":"1971","journal-title":"Syst. Biol"},{"key":"2023041405351887000_","first-page":"185","article-title":"Assessment of inter-laboratory differences in SARS-CoV-2 consensus genome assemblies between public health laboratories in Australia","volume-title":"Viruses","author":"Foster","year":"2022"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","DOI":"10.1101\/2021.04.23.441209","volume-title":"Insertions in SARS-CoV-2 genome caused by template switch and duplications give rise to new variants that merit monitoring","author":"Garushyants","year":"2021"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"1832","DOI":"10.1093\/molbev\/msaa047","article-title":"Online Bayesian phylodynamic inference in BEAST with application to epidemic reconstruction","volume":"37","author":"Gill","year":"2020","journal-title":"Mol. Biol. Evol"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1111\/j.1096-0031.1997.tb00239.x","article-title":"Efficient incremental character optimization","volume":"13","author":"Gladstein","year":"1997","journal-title":"Cladistics"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1111\/j.1096-0031.1996.tb00009.x","article-title":"Methods for faster parsimony analysis","volume":"12","author":"Goloboff","year":"1996","journal-title":"Cladistics"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1111\/j.1096-0031.1999.tb00278.x","article-title":"Analyzing large data sets in reasonable times: solutions for composite optima","volume":"15","author":"Goloboff","year":"1999","journal-title":"Cladistics"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1111\/cla.12160","article-title":"TNT version 1.5, including a full implementation of phylogenetic morphometrics","volume":"32","author":"Goloboff","year":"2016","journal-title":"Cladistics"},{"key":"2023041405351887000_","volume-title":"Using MPI: Portable Parallel Programming with the Message-Passing Interface","author":"Gropp","year":"1999","edition":"2nd edn"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/s12862-018-1131-3","article-title":"MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation","volume":"18","author":"Hoang","year":"2018","journal-title":"BMC Evol. Biol"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1038\/d41586-021-00525-x","article-title":"Want to track pandemic variants faster? Fix the bioinformatics bottleneck","volume":"591","author":"Hodcroft","year":"2021","journal-title":"Nature"},{"key":"2023041405351887000_","volume-title":"The Art of Computer Programming","author":"Knuth","year":"2011","edition":"3rd edn"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1038\/s41467-020-20880-z","article-title":"Genomic epidemiology of the early stages of the SARS-CoV-2 outbreak in Russia","volume":"12","author":"Komissarov","year":"2021","journal-title":"Nat. Commun"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"812","DOI":"10.1016\/j.cell.2020.06.043","article-title":"Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus","volume":"182","author":"Korber","year":"2020","journal-title":"Cell"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"1547","DOI":"10.1093\/molbev\/msy096","article-title":"MEGA X: molecular evolutionary genetics analysis across computing platforms","volume":"35","author":"Kumar","year":"2018","journal-title":"Mol. Biol. Evol"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1002\/jmv.25723","article-title":"Early phylogenetic estimate of the effective reproduction number of SARS-CoV-2","volume":"92","author":"Lai","year":"2020","journal-title":"J. Med. Virol"},{"key":"2023041405351887000_","first-page":"70","article-title":"Outbreak associated with SARS-CoV-2 B.1.617.2 (delta) variant in an elementary school\u2014Marin County, California, May\u2013June 2021","author":"Lam-Hine","year":"2021","journal-title":"MMWR Morb. Mortal Wkly. Rep"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"2225","DOI":"10.1093\/bioinformatics\/btab102","article-title":"Genozip: a universal extensible genomic data compressor","volume":"37","author":"Lan","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"D1115","DOI":"10.1093\/nar\/gkab959","article-title":"The UCSC genome browser database: 2022 update","volume":"50","author":"Lee","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"D19","DOI":"10.1093\/nar\/gkq1019","article-title":"The sequence read archive","volume":"39","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023041405351887000_","volume-title":"Mol. Biol. Evol.,","author":"McBroome","year":"2021"},{"key":"2023041405351887000_","author":"McBroome","year":"2022"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"1530","DOI":"10.1093\/molbev\/msaa015","article-title":"IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era","volume":"37","author":"Minh","year":"2020","journal-title":"Mol. Biol. Evol"},{"key":"2023041405351887000_","volume-title":"MAJORA: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance","author":"Nicholls","year":"2020"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"veab064","DOI":"10.1093\/ve\/veab064","article-title":"Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool","volume":"7","author":"O\u2019Toole","year":"2021","journal-title":"Virus Evol"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"1403","DOI":"10.1038\/s41564-020-0770-5","article-title":"A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology","volume":"5","author":"Rambaut","year":"2020","journal-title":"Nat. Microbiol"},{"key":"2023041405351887000_","volume-title":"A phylogeny-based metric for estimating changes in transmissibility from recurrent mutations in SARS-CoV-2 genomics","author":"Richard","year":"2021"},{"key":"2023041405351887000_","author":"Sanderson","year":"2021"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1137\/0128004","article-title":"Minimal mutation trees of sequences","volume":"28","author":"Sankoff","year":"1975","journal-title":"SIAM J. Appl. Math"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"30494","DOI":"10.2807\/1560-7917.ES.2017.22.13.30494","article-title":"GISAID: global initiative on sharing all influenza data\u2014from vision to reality","volume":"22","author":"Shu","year":"2017","journal-title":"Eurosurveillance"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"1251","DOI":"10.1093\/oxfordjournals.molbev.a026408","article-title":"Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used","volume":"17","author":"Takahashi","year":"2000","journal-title":"Mol. Biol. Evol"},{"key":"2023041405351887000_","author":"Thornlow","year":"2021"},{"key":"2023041405351887000_","author":"Turakhia","year":"2021"},{"key":"2023041405351887000_","doi-asserted-by":"crossref","first-page":"809","DOI":"10.1038\/s41588-021-00862-7","article-title":"Ultrafast sample placement on existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic","volume":"53","author":"Turakhia","year":"2021","journal-title":"Nat. Genet"},{"key":"2023041405351887000_","volume-title":"Transmission of SARS-CoV-2 lineage B.1.1.7 in England: insights from linking epidemiological and genetic data infectious diseases (except HIV\/AIDS)","author":"Volz","year":"2021"},{"key":"2023041405351887000_","volume-title":"PAUP. Phylogenetic Analysis Using Parsimony (and Other Methods)","author":"Swofford","year":"2003"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac401\/44318848\/btac401.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/15\/3734\/49884099\/btac401.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/15\/3734\/49884099\/btac401.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,23]],"date-time":"2023-11-23T10:30:12Z","timestamp":1700735412000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/15\/3734\/6613133"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,6,22]]},"references-count":44,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2022,8,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac401","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.01.12.475688","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,8,1]]},"published":{"date-parts":[[2022,6,22]]}}}