{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T06:33:45Z","timestamp":1771655625581,"version":"3.50.1"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T00:00:00Z","timestamp":1695340800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["82161148009"],"award-info":[{"award-number":["82161148009"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Strategic Priority Research Program of the Chinese Academy of Sciences, China","award":["XDB38030400"],"award-info":[{"award-number":["XDB38030400"]}]},{"name":"Capital Health Development and Research Special Programme","award":["2021-1G-3012"],"award-info":[{"award-number":["2021-1G-3012"]}]},{"name":"Key Collaborative Research Program of the Alliance of International Science Organizations","award":["ANSO-CR-KP-2022-09"],"award-info":[{"award-number":["ANSO-CR-KP-2022-09"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,9,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>To contain infectious diseases, it is crucial to determine the origin and transmission routes of the pathogen, as well as how the virus evolves. With the development of genome sequencing technology, genome epidemiology has emerged as a powerful approach for investigating the source and transmission of pathogens. In this study, we first presented the rationale for genomic tracing of SARS-CoV-2 and the challenges we currently face. Identifying the most genetically similar reference sequence to the query sequence is a critical step in genome tracing, typically achieved using either a phylogenetic tree or a sequence similarity search. However, these methods become inefficient or computationally prohibitive when dealing with tens of millions of sequences in the reference database, as we encountered during the COVID-19 pandemic. To address this challenge, we developed a novel genomic tracing algorithm capable of processing 6 million SARS-CoV-2 sequences in less than a minute. Instead of constructing a giant phylogenetic tree, we devised a weighted scoring system based on mutation characteristics to quantify sequences similarity. The developed method demonstrated superior performance compared to previous methods. Additionally, an online platform was developed to facilitate genomic tracing and visualization of the spatiotemporal distribution of sequences. The method will be a valuable addition to standard epidemiological investigations, enabling more efficient genomic tracing. Furthermore, the computational framework can be easily adapted to other pathogens, paving the way for routine genomic tracing of infectious diseases.<\/jats:p>","DOI":"10.1093\/bib\/bbad339","type":"journal-article","created":{"date-parts":[[2023,10,2]],"date-time":"2023-10-02T04:00:47Z","timestamp":1696219247000},"source":"Crossref","is-referenced-by-count":3,"title":["A fast and accurate method for SARS-CoV-2 genomic tracing"],"prefix":"10.1093","volume":"24","author":[{"given":"Wentai","family":"Ma","sequence":"first","affiliation":[{"name":"Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation , Beijing 100101 , China"},{"name":"University of Chinese Academy of Sciences , Beijing 100049 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leisheng","family":"Shi","sequence":"additional","affiliation":[{"name":"Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation , Beijing 100101 , China"},{"name":"University of Chinese Academy of Sciences , Beijing 100049 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mingkun","family":"Li","sequence":"additional","affiliation":[{"name":"Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation , Beijing 100101 , China"},{"name":"University of Chinese Academy of Sciences , Beijing 100049 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,10,1]]},"reference":[{"key":"2023100203585892400_ref1","doi-asserted-by":"crossref","first-page":"1372","DOI":"10.1126\/science.abj4176","article-title":"Durability of mRNA-1273 vaccine-induced antibodies against SARS-CoV-2 variants","volume":"373","author":"Pegu","year":"2021","journal-title":"Science"},{"key":"2023100203585892400_ref2","doi-asserted-by":"crossref","first-page":"642","DOI":"10.1016\/j.ajpath.2022.01.007","article-title":"Signals of significantly increased vaccine breakthrough, decreased hospitalization rates, and less severe disease in patients with coronavirus disease 2019 caused by the omicron variant of severe acute respiratory syndrome coronavirus 2 in Houston, Texas","volume":"192","author":"Christensen","year":"2022","journal-title":"Am J Pathol"},{"key":"2023100203585892400_ref3","doi-asserted-by":"crossref","first-page":"1861","DOI":"10.1093\/nsr\/nwaa264","article-title":"Cold-chain food contamination as the possible origin of COVID-19 resurgence in Beijing","volume":"7","author":"Pang","year":"2020","journal-title":"Natl Sci Rev"},{"key":"2023100203585892400_ref4","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1038\/nature19790","article-title":"The evolution of Ebola virus: insights from the 2013-2016 epidemic","volume":"538","author":"Holmes","year":"2016","journal-title":"Nature"},{"key":"2023100203585892400_ref5","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nature22402","article-title":"Zika virus evolution and spread in the Americas","volume":"546","author":"Metsky","year":"2017","journal-title":"Nature"},{"key":"2023100203585892400_ref6","doi-asserted-by":"crossref","first-page":"582","DOI":"10.1126\/science.abb9263","article-title":"Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California","volume":"369","author":"Deng","year":"2020","journal-title":"Science"},{"key":"2023100203585892400_ref7","doi-asserted-by":"crossref","first-page":"990","DOI":"10.1016\/j.cell.2020.04.021","article-title":"Coast-to-coast spread of SARS-CoV-2 during the early epidemic in the United States","volume":"181","author":"Fauver","year":"2020","journal-title":"Cell"},{"key":"2023100203585892400_ref8","doi-asserted-by":"crossref","first-page":"1777","DOI":"10.1093\/molbev\/msaa314","article-title":"Phylogenetic analysis of SARS-CoV-2 data is difficult","volume":"38","author":"Morel","year":"2021","journal-title":"Mol Biol Evol"},{"key":"2023100203585892400_ref9","doi-asserted-by":"crossref","first-page":"5819","DOI":"10.1093\/molbev\/msab264","article-title":"A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees","volume":"38","author":"McBroome","year":"2021","journal-title":"Mol Biol Evol"},{"key":"2023100203585892400_ref10","doi-asserted-by":"crossref","DOI":"10.2807\/1560-7917.ES.2017.22.13.30494","article-title":"GISAID: global initiative on sharing all influenza data - from vision to reality","volume":"22","author":"Shu","year":"2017","journal-title":"Euro Surveill"},{"key":"2023100203585892400_ref11","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pgph.0000704","article-title":"Genomics-informed outbreak investigations of SARS-CoV-2 using civet.","volume-title":"PLOS Glob Public Health","author":"O\u2019Toole"},{"key":"2023100203585892400_ref12","doi-asserted-by":"crossref","first-page":"3000","DOI":"10.1002\/jmv.26834","article-title":"Evolutionary analysis of SARS-CoV-2 spike protein for its different clades","volume":"93","author":"Pereson","year":"2021","journal-title":"J Med Virol"},{"key":"2023100203585892400_ref13","doi-asserted-by":"crossref","first-page":"104351","DOI":"10.1016\/j.meegid.2020.104351","article-title":"Emergence of genomic diversity and recurrent mutations in SARS-CoV-2","volume":"83","author":"Dorp","year":"2020","journal-title":"Infect Genet Evol"},{"key":"2023100203585892400_ref14","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1186\/s12967-020-02344-6","article-title":"Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant","volume":"18","author":"Pachetti","year":"2020","journal-title":"J Transl Med"},{"key":"2023100203585892400_ref15","doi-asserted-by":"crossref","first-page":"1530","DOI":"10.1093\/molbev\/msaa015","article-title":"IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era","volume":"37","author":"Minh","year":"2020","journal-title":"Mol Biol Evol"},{"key":"2023100203585892400_ref16","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1038\/nrg2361","article-title":"Linkage disequilibrium - understanding the evolutionary past and mapping the medical future","volume":"9","author":"Slatkin","year":"2008","journal-title":"Nat Rev Genet"},{"key":"2023100203585892400_ref17","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023100203585892400_ref18","doi-asserted-by":"crossref","first-page":"W58","DOI":"10.1093\/nar\/gkw233","article-title":"HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing","volume":"44","author":"Weissensteiner","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023100203585892400_ref19","doi-asserted-by":"crossref","first-page":"587","DOI":"10.2307\/1931577","article-title":"Bird populations of the highlands (North-Carolina) plateau in relation to plant succession and avian invasion","volume":"31","author":"Odum","year":"1950","journal-title":"Ecology"},{"key":"2023100203585892400_ref20","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1038\/s41564-021-00872-5","article-title":"A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology (vol 5, pg 1403, 2020)","volume":"6","author":"Rambaut","year":"2021","journal-title":"Nat Microbiol"},{"key":"2023100203585892400_ref21","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-020-18314-x","article-title":"Tracking the COVID-19 pandemic in Australia using genomics","volume":"11","author":"Seemann","year":"2020","journal-title":"Nat Commun"},{"key":"2023100203585892400_ref22","doi-asserted-by":"crossref","first-page":"e2024191","DOI":"10.1001\/jamanetworkopen.2020.24191","article-title":"Analysis of genomic characteristics and transmission routes of patients with confirmed SARS-CoV-2 in Southern California during the early stage of the US COVID-19 pandemic","volume":"3","author":"Zhang","year":"2020","journal-title":"JAMA Netw Open"},{"key":"2023100203585892400_ref23","doi-asserted-by":"crossref","DOI":"10.1126\/scitranslmed.abf0202","article-title":"Viral genomes reveal patterns of the SARS-CoV-2 outbreak in Washington state","volume":"13","author":"Muller","year":"2021","journal-title":"Sci Transl Med"},{"key":"2023100203585892400_ref24","doi-asserted-by":"crossref","first-page":"588","DOI":"10.1126\/science.abe3261","article-title":"Phylogenetic analysis of SARS-CoV-2 in Boston highlights the impact of superspreading events","volume":"371","author":"Lemieux","year":"2021","journal-title":"Science"},{"key":"2023100203585892400_ref25","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1089\/cmb.2020.0343","article-title":"Multiscale feedback loops in SARS-CoV-2 viral evolution","volume":"28","author":"Barrett","year":"2021","journal-title":"J Comput Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/6\/bbad339\/51822865\/bbad339.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/6\/bbad339\/51822865\/bbad339.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,2]],"date-time":"2023-10-02T04:01:11Z","timestamp":1696219271000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad339\/7287429"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,22]]},"references-count":25,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,9,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad339","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,9,22]]},"article-number":"bbad339"}}