{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T19:52:40Z","timestamp":1777751560302,"version":"3.51.4"},"reference-count":105,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,3,15]],"date-time":"2024-03-15T00:00:00Z","timestamp":1710460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>Metagenomic sequencing has revolutionized our understanding of microbiology. While metagenomic tools and approaches have been extensively evaluated and benchmarked, far less attention has been given to the reference sequence database used in metagenomic classification. Issues with reference sequence databases are pervasive. Database contamination is the most recognized issue in the literature; however, it remains relatively unmitigated in most analyses. Other common issues with reference sequence databases include taxonomic errors, inappropriate inclusion and exclusion criteria, and sequence content errors. This review covers ten common issues with reference sequence databases and the potential downstream consequences of these issues. Mitigation measures are discussed for each issue, including bioinformatic tools and database curation strategies. Together, these strategies present a path towards more accurate, reproducible and translatable metagenomic sequencing.<\/jats:p>","DOI":"10.3389\/fbinf.2024.1278228","type":"journal-article","created":{"date-parts":[[2024,3,15]],"date-time":"2024-03-15T04:39:58Z","timestamp":1710477598000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":64,"title":["Ten common issues with reference sequence databases and how to mitigate them"],"prefix":"10.3389","volume":"4","author":[{"given":"Samuel D.","family":"Chorlton","sequence":"first","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,3,15]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1016\/j.cels.2015.07.008","article-title":"Lack of evidence for plague or anthrax on the New York city subway","volume":"1","author":"Ackelsberg","year":"2015","journal-title":"Cell. Syst."},{"key":"B2","doi-asserted-by":"publisher","first-page":"eabl3533","DOI":"10.1126\/science.abl3533","article-title":"A complete reference genome improves analysis of human genetic variation","volume":"376","author":"Aganezov","year":"2022","journal-title":"Science"},{"key":"B3","doi-asserted-by":"publisher","first-page":"D898","DOI":"10.1093\/nar\/gkab929","article-title":"VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center","volume":"50","author":"Amos","year":"2022","journal-title":"Nucleic Acids Res."},{"key":"B4","doi-asserted-by":"publisher","first-page":"e0115813","DOI":"10.1371\/journal.pone.0115813","article-title":"Strategies to avoid wrongly labelled genomes using as example the detected wrong taxonomic affiliation for Aeromonas genomes in the GenBank database","volume":"10","author":"Beaz-Hidalgo","year":"2015","journal-title":"PLOS ONE"},{"key":"B5","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1111\/1574-6976.12015","article-title":"The future is now: single-cell genomics of bacteria and archaea","volume":"37","author":"Blainey","year":"2013","journal-title":"FEMS Microbiol. Rev."},{"key":"B6","doi-asserted-by":"publisher","first-page":"1633","DOI":"10.1038\/s41587-023-01688-w","article-title":"Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4","volume":"41","author":"Blanco-M\u00edguez","year":"2023","journal-title":"Nat. Biotechnol."},{"key":"B7","doi-asserted-by":"publisher","first-page":"725","DOI":"10.1038\/nbt.3893","article-title":"Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea","volume":"35","author":"Bowers","year":"2017","journal-title":"Nat. Biotechnol."},{"key":"B8","doi-asserted-by":"publisher","first-page":"198","DOI":"10.1186\/s13059-018-1568-0","article-title":"KrakenUniq: confident and fast metagenomics classification using unique k-mer counts","volume":"19","author":"Breitwieser","year":"2018","journal-title":"Genome Biol."},{"key":"B9","doi-asserted-by":"publisher","first-page":"954","DOI":"10.1101\/gr.245373.118","article-title":"Human contamination in bacterial genomes has created thousands of spurious proteins","volume":"29","author":"Breitwieser","year":"2019","journal-title":"Genome Res."},{"key":"B10","doi-asserted-by":"publisher","first-page":"giaa008","DOI":"10.1093\/gigascience\/giaa008","article-title":"GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms","volume":"9","author":"Browne","year":"2020","journal-title":"GigaScience"},{"key":"B11","doi-asserted-by":"publisher","first-page":"mgen000393","DOI":"10.1099\/mgen.0.000393","article-title":"Evaluation of methods for detecting human reads in microbial sequencing datasets","volume":"6","author":"Bush","year":"2020","journal-title":"Microb. Genom"},{"key":"B12","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinforma."},{"key":"B13","doi-asserted-by":"publisher","first-page":"100463","DOI":"10.1016\/j.crmeth.2023.100463","article-title":"A CRISPR-enhanced metagenomic NGS test to improve pandemic preparedness","volume":"3","author":"Chan","year":"2023","journal-title":"Cell. Rep. Methods"},{"key":"B14","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1038\/s42003-022-03114-4","article-title":"BugSplit enables genome-resolved metagenomics through highly accurate taxonomic binning of metagenomic assemblies","volume":"5","author":"Chandrakumar","year":"2022","journal-title":"Commun. Biol."},{"key":"B15","doi-asserted-by":"publisher","first-page":"e0090722","DOI":"10.1128\/msystems.00907-22","article-title":"Elimination of foreign sequences in eukaryotic viral reference genomes improves the accuracy of virome analysis","volume":"7","author":"Chen","year":"2022","journal-title":"mSystems"},{"key":"B16","doi-asserted-by":"publisher","first-page":"e62856","DOI":"10.1371\/journal.pone.0062856","article-title":"Effects of GC bias in next-generation-sequencing data on de novo genome assembly","volume":"8","author":"Chen","year":"2013","journal-title":"PLOS ONE"},{"key":"B17","doi-asserted-by":"publisher","first-page":"341","DOI":"10.1038\/s41576-019-0113-7","article-title":"Clinical metagenomics","volume":"20","author":"Chiu","year":"2019","journal-title":"Nat. Rev. Genet."},{"key":"B18","doi-asserted-by":"publisher","first-page":"2386","DOI":"10.1099\/ijsem.0.002809","article-title":"Using average nucleotide identity to improve taxonomic assignments in prokaryotic genomes at the NCBI","volume":"68","author":"Ciufo","year":"2018","journal-title":"Int. J. Syst. Evol. Microbiol."},{"key":"B19","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1186\/s13059-022-02619-9","article-title":"Contamination detection in genomic data: more is not enough","volume":"23","author":"Cornet","year":"2022","journal-title":"Genome Biol."},{"key":"B20","doi-asserted-by":"publisher","first-page":"e0200323","DOI":"10.1371\/journal.pone.0200323","article-title":"Consensus assessment of the contamination level of publicly available cyanobacterial genomes","volume":"13","author":"Cornet","year":"2018","journal-title":"PLOS ONE"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1255159","DOI":"10.3389\/fbinf.2023.1255159","article-title":"Better research software tools to elevate the rate of scientific discovery or why we need to invest in research software engineering","volume":"3","author":"Deschamps","year":"2023","journal-title":"Front. Bioinforma."},{"key":"B22","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1016\/j.nmni.2017.09.003","article-title":"Accurate differentiation of Escherichia coli and Shigella serogroups: challenges and strategies","volume":"21","author":"Devanga Ragupathi","year":"2018","journal-title":"New Microbes New Infect."},{"key":"B23","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1186\/s12859-021-04089-5","article-title":"BugSeq: a highly accurate cloud platform for long-read metagenomic analyses","volume":"22","author":"Fan","year":"2021","journal-title":"BMC Bioinforma."},{"key":"B24","doi-asserted-by":"publisher","first-page":"D1086","DOI":"10.1093\/nar\/gku1127","article-title":"Type material in the NCBI taxonomy database","volume":"43","author":"Federhen","year":"2015","journal-title":"Nucleic Acids Res."},{"key":"B25","article-title":"Empirical study on software and process quality in bioinformatics tools","volume":"2022","author":"Ferenc","year":"2022"},{"key":"B26","doi-asserted-by":"publisher","first-page":"e28819","DOI":"10.1371\/journal.pone.0028819","article-title":"Gentle masking of low-complexity sequences improves homology search","volume":"6","author":"Frith","year":"2011","journal-title":"PLoS One"},{"key":"B27","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1186\/1471-2105-11-80","article-title":"Parameters for accurate genome alignment","volume":"11","author":"Frith","year":"2010","journal-title":"BMC Bioinforma."},{"key":"B28","doi-asserted-by":"publisher","first-page":"e0011922","DOI":"10.1128\/cmr.00119-22","article-title":"Agnostic sequencing for detection of viral pathogens","volume":"36","author":"Gauthier","year":"2023","journal-title":"Clin. Microbiol. Rev."},{"key":"B29","first-page":"2023","article-title":"Major data analysis errors invalidate cancer microbiome findings","author":"Gihawi","year":"2023"},{"key":"B30","doi-asserted-by":"publisher","first-page":"392","DOI":"10.1038\/nm.4517","article-title":"Current understanding of the human microbiome","volume":"24","author":"Gilbert","year":"2018","journal-title":"Nat. Med."},{"key":"B31","volume-title":"Possibility to preapare a new nt database \u00b7 Issue #227 \u00b7 DaehwanKimLab\/centrifuge","year":"2022"},{"key":"B32","doi-asserted-by":"publisher","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics"},{"key":"B33","doi-asserted-by":"publisher","first-page":"D851","DOI":"10.1093\/nar\/gkx1068","article-title":"RefSeq: an update on prokaryotic genome annotation and curation","volume":"46","author":"Haft","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"B34","doi-asserted-by":"publisher","first-page":"497","DOI":"10.1016\/s0140-6736(20)30183-5","article-title":"Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China","volume":"395","author":"Huang","year":"2020","journal-title":"Lancet"},{"key":"B35","first-page":"2023","volume-title":"miniBUSCO: a faster and more accurate reimplementation of BUSCO","author":"Huang","year":"2023"},{"key":"B36","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1186\/s13059-015-0849-0","article-title":"Circlator: automated circularization of genome assemblies using long sequencing reads","volume":"16","author":"Hunt","year":"2015","journal-title":"Genome Biol."},{"key":"B37","doi-asserted-by":"publisher","first-page":"e281","DOI":"10.1016\/s1473-3099(20)30939-7","article-title":"Genomic-informed pathogen surveillance in Africa: opportunities and challenges","volume":"21","author":"Inzaule","year":"2021","journal-title":"Lancet Infect. Dis."},{"key":"B38","doi-asserted-by":"publisher","first-page":"2761","DOI":"10.1128\/jcm.01228-07","article-title":"16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls","volume":"45","author":"Janda","year":"2007","journal-title":"J. Clin. Microbiol."},{"key":"B39","doi-asserted-by":"publisher","first-page":"119","DOI":"10.12688\/wellcomeopenres.15806.1","article-title":"Ethical challenges in pathogen sequencing: a systematic scoping review","volume":"5","author":"Johnson","year":"2020","journal-title":"Wellcome Open Res."},{"key":"B40","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1038\/s43705-022-00092-w","article-title":"MIxS-SA: a MIxS extension defining the minimum information standard for sequence data from symbiont-associated micro-organisms","volume":"2","author":"Jorge","year":"2022","journal-title":"ISME Commun."},{"key":"B41","doi-asserted-by":"publisher","first-page":"giaa111","DOI":"10.1093\/gigascience\/giaa111","article-title":"IDseq\u2014an open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring","volume":"9","author":"Kalantar","year":"2020","journal-title":"GigaScience"},{"key":"B42","doi-asserted-by":"publisher","first-page":"005707","DOI":"10.1099\/ijsem.0.005707","article-title":"Collection and curation of prokaryotic genome assemblies from type strains at NCBI","volume":"73","author":"Kannan","year":"2023","journal-title":"Int. J. Syst. Evol. Microbiol."},{"key":"B43","article-title":"MetaGraph: indexing and analysing nucleotide archives at petabase-scale","volume":"2020","author":"Karasikov","year":"2020"},{"key":"B44","doi-asserted-by":"publisher","first-page":"1721","DOI":"10.1101\/gr.210641.116","article-title":"Centrifuge: rapid and sensitive classification of metagenomic sequences","volume":"26","author":"Kim","year":"2016","journal-title":"Genome Res."},{"key":"B45","doi-asserted-by":"publisher","first-page":"1229","DOI":"10.1007\/s00705-013-1877-2","article-title":"Virus nomenclature below the species level: a standardized nomenclature for filovirus strains and variants rescued from cDNA","volume":"159","author":"Kuhn","year":"2014","journal-title":"Arch. Virol."},{"key":"B46","doi-asserted-by":"publisher","first-page":"e01360","DOI":"10.1128\/mbio.01360-14","article-title":"Standards for sequencing viral genomes in the era of high-throughput sequencing","volume":"5","author":"Ladner","year":"2014","journal-title":"mBio"},{"key":"B47","doi-asserted-by":"publisher","first-page":"2741","DOI":"10.1093\/nar\/20.11.2741","article-title":"Corruption of genomic databases with anomalous sequence","volume":"20","author":"Lamperti","year":"1992","journal-title":"Nucleic Acids Res."},{"key":"B48","doi-asserted-by":"publisher","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"B49","doi-asserted-by":"publisher","first-page":"e1006277","DOI":"10.1371\/journal.pcbi.1006277","article-title":"Removing contaminants from databases of draft genomes","volume":"14","author":"Lu","year":"2018","journal-title":"PLOS Comput. Biol."},{"key":"B50","doi-asserted-by":"publisher","first-page":"540","DOI":"10.1038\/s41564-021-00888-x","article-title":"Fungal taxonomy and sequence-based nomenclature","volume":"6","author":"L\u00fccking","year":"2021","journal-title":"Nat. Microbiol."},{"key":"B51","doi-asserted-by":"publisher","first-page":"755101","DOI":"10.3389\/fmicb.2021.755101","article-title":"Contamination in reference sequence databases: time for divide-and-rule tactics","volume":"12","author":"Lupo","year":"2021","journal-title":"Front. Microbiol."},{"key":"B52","doi-asserted-by":"publisher","first-page":"4647","DOI":"10.1093\/molbev\/msab199","article-title":"BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes","volume":"38","author":"Manni","year":"2021","journal-title":"Mol. Biol. Evol."},{"key":"B53","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12864-020-6592-2","article-title":"The use of taxon-specific reference databases compromises metagenomic classification","volume":"21","author":"Marcelino V","year":"2020","journal-title":"BMC Genomics"},{"key":"B54","doi-asserted-by":"publisher","first-page":"e675","DOI":"10.7717\/peerj.675","article-title":"Unexpected cross-species contamination in genome sequencing projects","volume":"2","author":"Merchant","year":"2014","journal-title":"PeerJ"},{"key":"B55","first-page":"712166","article-title":"Correcting index databases improves metagenomic studies","author":"M\u00e9ric","year":"2019"},{"key":"B56","doi-asserted-by":"publisher","first-page":"1028","DOI":"10.1089\/cmb.2006.13.1028","article-title":"A fast and symmetric DUST implementation to mask low-complexity DNA sequences","volume":"13","author":"Morgulis","year":"2006","journal-title":"J. Comput. Biol."},{"key":"B57","doi-asserted-by":"publisher","first-page":"20097","DOI":"10.1038\/s41598-019-56552-2","article-title":"Respiratory syncytial virus A genotype classification based on systematic intergenotypic and intragenotypic sequence analysis","volume":"9","author":"Mu\u00f1oz-Escalante","year":"2019","journal-title":"Sci. Rep."},{"key":"B58","doi-asserted-by":"publisher","first-page":"e01309-20","DOI":"10.1128\/JCM.01309-20","article-title":"Summary of novel bacterial isolates derived from human clinical specimens and nomenclature revisions published in 2018 and 2019","volume":"61","author":"Munson","year":"2022","journal-title":"J. Clin. Microbiol."},{"key":"B59","doi-asserted-by":"publisher","first-page":"1180","DOI":"10.1101\/gr.171934.113","article-title":"A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples","volume":"24","author":"Naccache","year":"2014","journal-title":"Genome Res."},{"key":"B60","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1186\/s13059-018-1554-6","article-title":"RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification","volume":"19","author":"Nasko","year":"2018","journal-title":"Genome Biol."},{"key":"B61","doi-asserted-by":"publisher","first-page":"578","DOI":"10.1038\/s41587-020-00774-7","article-title":"CheckV assesses the quality and completeness of metagenome-assembled viral genomes","volume":"39","author":"Nayfach","year":"2021","journal-title":"Nat. Biotechnol."},{"key":"B62","doi-asserted-by":"publisher","first-page":"854","DOI":"10.1016\/j.cell.2016.04.008","article-title":"Temporal stability of the human skin microbiome","volume":"165","author":"Oh","year":"2016","journal-title":"Cell."},{"key":"B63","doi-asserted-by":"publisher","first-page":"D733","DOI":"10.1093\/nar\/gkv1189","article-title":"Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation","volume":"44","author":"O Leary","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"B64","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1186\/s13059-021-02393-0","article-title":"GUNC: detection of chimerism and contamination in prokaryotic genomes","volume":"22","author":"Orakov","year":"2021","journal-title":"Genome Biol."},{"key":"B65","doi-asserted-by":"publisher","first-page":"834","DOI":"10.3389\/fmicb.2019.00834","article-title":"Large-scale genomics reveals the genetic characteristics of seven species and importance of phylogenetic distance for estimating pan-genome size","volume":"10","author":"Park","year":"2019","journal-title":"Front. Microbiol."},{"key":"B66","doi-asserted-by":"publisher","first-page":"1079","DOI":"10.1038\/s41587-020-0501-8","article-title":"A complete domain-to-species taxonomy for Bacteria and Archaea","volume":"38","author":"Parks","year":"2020","journal-title":"Nat. Biotechnol."},{"key":"B67","doi-asserted-by":"publisher","first-page":"D785","DOI":"10.1093\/nar\/gkab776","article-title":"GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy","volume":"50","author":"Parks","year":"2022","journal-title":"Nucleic Acids Res."},{"key":"B68","doi-asserted-by":"publisher","first-page":"1043","DOI":"10.1101\/gr.186072.114","article-title":"CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes","volume":"25","author":"Parks","year":"2015","journal-title":"Genome Res."},{"key":"B69","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1373\/clinchem.2014.221770","article-title":"MALDI-TOF MS for the diagnosis of infectious diseases","volume":"61","author":"Patel","year":"2015","journal-title":"Clin. Chem."},{"key":"B70","doi-asserted-by":"publisher","first-page":"i12","DOI":"10.1093\/bioinformatics\/btaa458","article-title":"ganon: precise metagenomics classification against large and up-to-date sets of reference sequences","volume":"36","author":"Piro","year":"2020","journal-title":"Bioinformatics"},{"key":"B71","doi-asserted-by":"publisher","first-page":"431","DOI":"10.3390\/v15020431","article-title":"Enhanced viral metagenomics with lazypipe 2","volume":"15","author":"Plyusnin","year":"2023","journal-title":"Viruses"},{"key":"B72","doi-asserted-by":"publisher","first-page":"D590","DOI":"10.1093\/nar\/gks1219","article-title":"The SILVA ribosomal RNA gene database project: improved data processing and web-based tools","volume":"41","author":"Quast","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"B73","doi-asserted-by":"publisher","first-page":"lqab071","DOI":"10.1093\/nargab\/lqab071","article-title":"CONSULT: accurate contamination removal using locality-sensitive hashing","volume":"3","author":"Rachtman","year":"2021","journal-title":"NAR Genomics Bioinforma."},{"key":"B74","doi-asserted-by":"publisher","first-page":"veaa052","DOI":"10.1093\/ve\/veaa052","article-title":"Towards a unified classification for human respiratory syncytial virus genotypes","volume":"6","author":"Ramaekers","year":"2020","journal-title":"Virus Evol."},{"key":"B75","doi-asserted-by":"publisher","first-page":"e000435","DOI":"10.1099\/mgen.0.000435","article-title":"Universal whole-sequence-based plasmid typing and its utility to prediction of host range and epidemiological surveillance","volume":"6","author":"Robertson","year":"2020","journal-title":"Microb. Genomics"},{"key":"B76","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1038\/nbt.4306","article-title":"Minimum information about an uncultivated virus genome (MIUViG)","volume":"37","author":"Roux","year":"2019","journal-title":"Nat. Biotechnol."},{"key":"B77","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1186\/s12859-023-05492-w","article-title":"HoCoRT: host contamination removal tool","volume":"24","author":"Rumbavicius","year":"2023","journal-title":"BMC Bioinforma."},{"key":"B78","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1186\/s13059-020-02155-4","article-title":"Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC","volume":"21","author":"Saary","year":"2020","journal-title":"Genome Biol."},{"key":"B79","doi-asserted-by":"publisher","first-page":"755","DOI":"10.1093\/bioinformatics\/btx669","article-title":"VecScreen_plus_taxonomy: imposing a tax(onomy) increase on vector contamination screening","volume":"34","author":"Sch\u00e4ffer","year":"2018","journal-title":"Bioinformatics"},{"key":"B80","doi-asserted-by":"publisher","first-page":"849","DOI":"10.1101\/gr.213611.116","article-title":"Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly","volume":"27","author":"Schneider","year":"2017","journal-title":"Genome Res."},{"key":"B81","doi-asserted-by":"publisher","first-page":"baaa062","DOI":"10.1093\/database\/baaa062","article-title":"NCBI Taxonomy: a comprehensive update on curation, resources and tools","volume":"2020","author":"Schoch","year":"2020","journal-title":"Database"},{"key":"B82","doi-asserted-by":"publisher","first-page":"527102","DOI":"10.3389\/fcimb.2020.527102","article-title":"The most frequently used sequencing Technologies and assembly methods in different time segments of the bacterial surveillance and RefSeq genome databases","volume":"10","author":"Segerman","year":"2020","journal-title":"Front. Cell. Infect. Microbiol."},{"key":"B83","doi-asserted-by":"publisher","first-page":"823","DOI":"10.1038\/s41592-022-01539-7","article-title":"Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing","volume":"19","author":"Sereika","year":"2022","journal-title":"Nat. Methods"},{"key":"B84","doi-asserted-by":"publisher","first-page":"980","DOI":"10.1089\/cmb.2005.12.980","article-title":"Correcting BLAST e-values for low-complexity segments","volume":"12","author":"Sharon","year":"2005","journal-title":"J. Comput. Biol."},{"key":"B85","doi-asserted-by":"publisher","first-page":"btac845","DOI":"10.1093\/bioinformatics\/btac845","article-title":"KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping","volume":"39","author":"Shen","year":"2023","journal-title":"Bioinformatics"},{"key":"B86","doi-asserted-by":"publisher","first-page":"3313","DOI":"10.1038\/s41467-019-11306-6","article-title":"FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science","volume":"10","author":"Sichtig","year":"2019","journal-title":"Nat. Commun."},{"key":"B87","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1186\/s13059-020-02023-1","article-title":"Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank","volume":"21","author":"Steinegger","year":"2020","journal-title":"Genome Biol."},{"key":"B88","doi-asserted-by":"publisher","first-page":"2542","DOI":"10.1038\/s41467-018-04964-5","article-title":"Clustering huge protein sequence sets in linear time","volume":"9","author":"Steinegger","year":"2018","journal-title":"Nat. Commun."},{"key":"B89","doi-asserted-by":"publisher","first-page":"870","DOI":"10.1038\/s41467-018-03317-6","article-title":"Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen","volume":"9","author":"Stewart","year":"2018","journal-title":"Nat. Commun."},{"key":"B90","doi-asserted-by":"publisher","first-page":"1196","DOI":"10.1038\/nmeth.2693","article-title":"Metagenomic species profiling using universal phylogenetic marker genes","volume":"10","author":"Sunagawa","year":"2013","journal-title":"Nat. Methods"},{"key":"B91","doi-asserted-by":"publisher","first-page":"1079","DOI":"10.1038\/s41564-023-01381-3","article-title":"Reconstruction of the personal information from human genome reads in gut metagenome sequencing data","volume":"8","author":"Tomofuji","year":"2023","journal-title":"Nat. Microbiol."},{"key":"B92","doi-asserted-by":"publisher","first-page":"805","DOI":"10.1038\/nrg1709","article-title":"Metagenomics: DNA sequencing of environmental samples","volume":"6","author":"Tringe","year":"2005","journal-title":"Nat. Rev. Genet."},{"key":"B93","doi-asserted-by":"publisher","first-page":"e006597","DOI":"10.1136\/bmjgh-2021-006597","article-title":"The economics of improving global infectious disease surveillance","volume":"6","author":"Vries","year":"2021","journal-title":"BMJ Glob. Health"},{"key":"B94","doi-asserted-by":"publisher","first-page":"2633","DOI":"10.1007\/s00705-021-05156-1","article-title":"Changes to virus taxonomy and to the international code of virus classification and nomenclature ratified by the international committee on taxonomy of viruses (2021)","volume":"166","author":"Walker","year":"2021","journal-title":"Arch. Virol."},{"key":"B95","doi-asserted-by":"publisher","first-page":"e1006583","DOI":"10.1371\/journal.pcbi.1006583","article-title":"Deepbinner: demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks","volume":"14","author":"Wick","year":"2018","journal-title":"PLOS Comput. Biol."},{"key":"B96","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1186\/s13059-019-1891-0","article-title":"Improved metagenomic analysis with Kraken 2","volume":"20","author":"Wood","year":"2019","journal-title":"Genome Biol."},{"key":"B97","doi-asserted-by":"publisher","first-page":"mgen000949","DOI":"10.1099\/mgen.0.000949","article-title":"From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools","volume":"9","author":"Wright","year":"2023","journal-title":"Microb. Genom"},{"key":"B98","doi-asserted-by":"publisher","first-page":"2225","DOI":"10.3389\/fmicb.2018.02225","article-title":"Detection of viral pathogens with multiplex Nanopore MinION sequencing: Be careful with cross-talk","volume":"9","author":"Xu","year":"2018","journal-title":"Front. Microbiol."},{"key":"B99","doi-asserted-by":"publisher","first-page":"779","DOI":"10.1016\/j.cell.2019.07.010","article-title":"Benchmarking metagenomics tools for taxonomic classification","volume":"178","author":"Ye","year":"2019","journal-title":"Cell."},{"key":"B100","doi-asserted-by":"publisher","first-page":"12941","DOI":"10.1038\/s41598-021-92435-1","article-title":"Genetic diversity and molecular evolution of human respiratory syncytial virus A and B","volume":"11","author":"Yu","year":"2021","journal-title":"Sci. Rep."},{"key":"B101","first-page":"2023","volume-title":"Comprehensive Assessment of Eleven de novo HiFi Assemblers on Complex Eukaryotic Genomes and Metagenomes","author":"Yu","year":"2023"},{"key":"B102","doi-asserted-by":"publisher","first-page":"i35","DOI":"10.1093\/bioinformatics\/btv231","article-title":"Reconstructing 16S rRNA genes in metagenomic data","volume":"31","author":"Yuan","year":"2015","journal-title":"Bioinformatics"},{"key":"B103","volume-title":"Faster and more accurate sequence alignment with SNAP","author":"Zaharia","year":"2011"},{"key":"B104","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1186\/s12864-015-1308-8","article-title":"A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification","volume":"16","author":"Zhao","year":"2015","journal-title":"BMC Genomics"},{"key":"B105","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/s40168-018-0399-2","article-title":"ReprDB and panDB: minimalist databases with maximal microbial representation","volume":"6","author":"Zhou","year":"2018","journal-title":"Microbiome"}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1278228\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,15]],"date-time":"2024-03-15T04:40:03Z","timestamp":1710477603000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1278228\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,15]]},"references-count":105,"alternative-id":["10.3389\/fbinf.2024.1278228"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2024.1278228","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,15]]},"article-number":"1278228"}}