{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T13:53:42Z","timestamp":1761746022487,"version":"3.41.2"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2024,10,10]],"date-time":"2024-10-10T00:00:00Z","timestamp":1728518400000},"content-version":"vor","delay-in-days":9,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Canadian Safety and Security Program","award":["CSSP-2022-CP-2538"],"award-info":[{"award-number":["CSSP-2022-CP-2538"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>State-of-the-art tools for classifying metagenomic sequencing reads provide both rapid and accurate options, although the combination of both in a single tool is a constantly improving area of research. The machine learning-based Na\u00efve Bayes Classifier (NBC) approach provides a theoretical basis for accurate classification of all reads in a sample.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We developed the multithreaded Minimizer-based Na\u00efve Bayes Classifier (MNBC) tool to improve the NBC approach by applying minimizers, as well as plurality voting for closely related classification scores. A standard reference- and test-sequence framework using simulated variable-length reads benchmarked MNBC with six other state-of-the-art tools: MetaMaps, Ganon, Kraken2, KrakenUniq, CLARK, and Centrifuge. We also applied MNBC to the \u201cmarine\u201d and \u201cstrain-madness\u201d short-read metagenomic datasets in the Critical Assessment of Metagenome Interpretation (CAMI) II challenge using a corresponding database from the time. MNBC efficiently identified reads from unknown microorganisms, and exhibited the highest species- and genus-level precision and recall on short reads, as well as the highest species-level precision on long reads. It also achieved the highest accuracy on the \u201cstrain-madness\u201d dataset.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>MNBC is freely available at: https:\/\/github.com\/ComputationalPathogens\/MNBC.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae601","type":"journal-article","created":{"date-parts":[[2024,10,10]],"date-time":"2024-10-10T15:35:42Z","timestamp":1728574542000},"source":"Crossref","is-referenced-by-count":2,"title":["MNBC: a multithreaded Minimizer-based Na\u00efve Bayes Classifier for improved metagenomic sequence classification"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-3878-2880","authenticated-orcid":false,"given":"Ruipeng","family":"Lu","sequence":"first","affiliation":[{"name":"National Centre for Animal Disease, Canadian Food Inspection Agency , Lethbridge County, AB, T1J 5R7,","place":["Canada"]}]},{"given":"Tim","family":"Dumonceaux","sequence":"additional","affiliation":[{"name":"Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada , Saskatoon, SK, S7N 0X2,","place":["Canada"]}]},{"given":"Muhammad","family":"Anzar","sequence":"additional","affiliation":[{"name":"Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada , Saskatoon, SK, S7N 0X2,","place":["Canada"]}]},{"given":"Athanasios","family":"Zovoilis","sequence":"additional","affiliation":[{"name":"Department of Biochemistry and Medical Genetics, University of Manitoba , Winnipeg, MB, R3E 0J9,","place":["Canada"]}]},{"given":"Kym","family":"Antonation","sequence":"additional","affiliation":[{"name":"National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada , Winnipeg, MB, R3E 3M4,","place":["Canada"]}]},{"given":"Dillon","family":"Barker","sequence":"additional","affiliation":[{"name":"National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada , Winnipeg, MB, R3E 3M4,","place":["Canada"]}]},{"given":"Cindi","family":"Corbett","sequence":"additional","affiliation":[{"name":"National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada , Winnipeg, MB, R3E 3M4,","place":["Canada"]}]},{"given":"Celine","family":"Nadon","sequence":"additional","affiliation":[{"name":"National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada , Winnipeg, MB, R3E 3M4,","place":["Canada"]}]},{"given":"James","family":"Robertson","sequence":"additional","affiliation":[{"name":"National Microbiology Laboratory at Guelph, Public Health Agency of Canada , Guelph, ON, N1G 3W4,","place":["Canada"]}]},{"given":"Shannon H C","family":"Eagle","sequence":"additional","affiliation":[{"name":"National Microbiology Laboratory at Guelph, Public Health Agency of Canada , Guelph, ON, N1G 3W4,","place":["Canada"]}]},{"given":"Oliver","family":"Lung","sequence":"additional","affiliation":[{"name":"National Centre for Foreign Animal Disease, Canadian Food Inspection Agency , Winnipeg, MB, R3E 3M4,","place":["Canada"]}]},{"given":"Josip","family":"Rudar","sequence":"additional","affiliation":[{"name":"National Centre for Foreign Animal Disease, Canadian Food Inspection Agency , Winnipeg, MB, R3E 3M4,","place":["Canada"]}]},{"given":"Om","family":"Surujballi","sequence":"additional","affiliation":[{"name":"Ottawa Animal Health Laboratory, Canadian Food Inspection Agency , Ottawa, ON, K2J 4S1,","place":["Canada"]}]},{"given":"Chad","family":"Laing","sequence":"additional","affiliation":[{"name":"National Centre for Animal Disease, Canadian Food Inspection Agency , Lethbridge County, AB, T1J 5R7,","place":["Canada"]}]}],"member":"286","published-online":{"date-parts":[[2024,10,10]]},"reference":[{"key":"2024103010381648700_btae601-B1","first-page":"1111","article-title":"A reliable effective terascale linear learning system","volume":"15","author":"Agarwal","year":"2014","journal-title":"J Mach Learn Res"},{"key":"2024103010381648700_btae601-B2","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1186\/1471-2105-13-92","article-title":"A comparative evaluation of sequence classification programs","volume":"13","author":"Bazinet","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2024103010381648700_btae601-B3","doi-asserted-by":"crossref","first-page":"1633","DOI":"10.1038\/s41587-023-01688-w","article-title":"Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4","volume":"41","author":"Blanco-M\u00edguez","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2024103010381648700_btae601-B4","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1038\/nmeth.1358","article-title":"Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models","volume":"6","author":"Brady","year":"2009","journal-title":"Nat Methods"},{"key":"2024103010381648700_btae601-B5","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1186\/s13059-018-1568-0","article-title":"KrakenUniq: confident and fast metagenomics classification using unique k-mer counts","volume":"19","author":"Breitwieser","year":"2018","journal-title":"Genome Biol"},{"first-page":"21","year":"1997","author":"Broder","key":"2024103010381648700_btae601-B6"},{"year":"1994","author":"Burrows","key":"2024103010381648700_btae601-B7"},{"key":"2024103010381648700_btae601-B8","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach Learn"},{"key":"2024103010381648700_btae601-B9","doi-asserted-by":"crossref","first-page":"i766","DOI":"10.1093\/bioinformatics\/bty567","article-title":"DREAM-Yara: an exact read mapper for very large databases with short update time","volume":"34","author":"Dadi","year":"2018","journal-title":"Bioinformatics"},{"key":"2024103010381648700_btae601-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J R Stat Soc Ser B Methodol"},{"key":"2024103010381648700_btae601-B11","doi-asserted-by":"crossref","first-page":"3066","DOI":"10.1038\/s41467-019-10934-2","article-title":"Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps","volume":"10","author":"Dilthey","year":"2019","journal-title":"Nat Commun"},{"volume-title":"An Introduction to Probability Theory and Its Applications","year":"1991","author":"Feller","key":"2024103010381648700_btae601-B12"},{"first-page":"390","year":"2000","author":"Ferragina","key":"2024103010381648700_btae601-B13"},{"first-page":"127","year":"2007","author":"Flajolet","key":"2024103010381648700_btae601-B14"},{"year":"2020","author":"Fritz","key":"2024103010381648700_btae601-B15","doi-asserted-by":"publisher","DOI":"10.4126\/FRL01-006425521"},{"first-page":"683","year":"2013","author":"Heule","key":"2024103010381648700_btae601-B16"},{"first-page":"240","year":"1991","author":"Jokinen","key":"2024103010381648700_btae601-B17"},{"key":"2024103010381648700_btae601-B18","doi-asserted-by":"crossref","first-page":"1721","DOI":"10.1101\/gr.210641.116","article-title":"Centrifuge: rapid and sensitive classification of metagenomic sequences","volume":"26","author":"Kim","year":"2016","journal-title":"Genome Res"},{"year":"2007","author":"Langford","key":"2024103010381648700_btae601-B19"},{"key":"2024103010381648700_btae601-B20","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat Methods"},{"key":"2024103010381648700_btae601-B21","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2024103010381648700_btae601-B22","doi-asserted-by":"crossref","first-page":"lqaa009","DOI":"10.1093\/nargab\/lqaa009","article-title":"DeepMicrobes: taxonomic classification for metagenomics with deep learning","volume":"2","author":"Liang","year":"2020","journal-title":"NAR Genomics Bioinforma"},{"key":"2024103010381648700_btae601-B23","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2164-12-S2-S4","article-title":"Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences","volume":"12","author":"Liu","year":"2011","journal-title":"BMC Genomics"},{"key":"2024103010381648700_btae601-B24","doi-asserted-by":"crossref","first-page":"e0283536","DOI":"10.1371\/journal.pone.0283536","article-title":"MT-MAG: accurate and interpretable machine learning for complete or partial taxonomic assignments of metagenomeassembled genomes","volume":"18","author":"Li","year":"2023","journal-title":"PLoS One"},{"key":"2024103010381648700_btae601-B25","doi-asserted-by":"crossref","first-page":"W20","DOI":"10.1093\/nar\/gkh435","article-title":"BLAST: at the core of a powerful and diverse set of sequence analysis tools","volume":"32","author":"McGinnis","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2024103010381648700_btae601-B26","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1186\/s13059-017-1299-7","article-title":"Comprehensive benchmarking and ensemble approaches for metagenomic classifiers","volume":"18","author":"McIntyre","year":"2017","journal-title":"Genome Biol"},{"key":"2024103010381648700_btae601-B27","doi-asserted-by":"crossref","first-page":"11257","DOI":"10.1038\/ncomms11257","article-title":"Fast and sensitive taxonomic classification for metagenomics with Kaiju","volume":"7","author":"Menzel","year":"2016","journal-title":"Nat Commun"},{"key":"2024103010381648700_btae601-B28","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1038\/s41592-022-01431-4","article-title":"Critical assessment of metagenome interpretation: the second round of challenges","volume":"19","author":"Meyer","year":"2022","journal-title":"Nat Methods"},{"key":"2024103010381648700_btae601-B29","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2024103010381648700_btae601-B30","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1186\/s12864-015-1419-2","article-title":"CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers","volume":"16","author":"Ounit","year":"2015","journal-title":"BMC Genomics"},{"key":"2024103010381648700_btae601-B31","doi-asserted-by":"crossref","first-page":"i12","DOI":"10.1093\/bioinformatics\/btaa458","article-title":"ganon: precise metagenomics classification against large and up-to-date sets of reference sequences","volume":"36","author":"Piro","year":"2020","journal-title":"Bioinformatics"},{"key":"2024103010381648700_btae601-B32","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1146\/annurev-genom-090413-025358","article-title":"Alignment of next-generation sequencing reads","volume":"16","author":"Reinert","year":"2015","journal-title":"Annu Rev Genomics Hum Genet"},{"key":"2024103010381648700_btae601-B33","doi-asserted-by":"crossref","first-page":"3363","DOI":"10.1093\/bioinformatics\/bth408","article-title":"Reducing storage requirements for biological sequence comparison","volume":"20","author":"Roberts","year":"2004","journal-title":"Bioinformatics"},{"key":"2024103010381648700_btae601-B34","first-page":"205969","article-title":"Metagenome fragment classification using N-mer frequency profiles","volume":"2008","author":"Rosen","year":"2008","journal-title":"Adv Bioinf"},{"key":"2024103010381648700_btae601-B35","doi-asserted-by":"crossref","first-page":"e218","DOI":"10.1002\/cpz1.218","article-title":"mOTUs: profiling taxonomic composition, transcriptional activity and strain populations of microbial communities","volume":"1","author":"Ruscheweyh","year":"2021","journal-title":"Curr Protoc"},{"key":"2024103010381648700_btae601-B36","doi-asserted-by":"crossref","first-page":"1208","DOI":"10.1101\/gr.260398.119","article-title":"LEMMI: a continuous benchmarking platform for metagenomics classifiers","volume":"30","author":"Seppey","year":"2020","journal-title":"Genome Res"},{"key":"2024103010381648700_btae601-B37","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/2042-5783-2-3","article-title":"Metagenomics\u2014a guide from sampling to data analysis","volume":"2","author":"Thomas","year":"2012","journal-title":"Microb Inform Exp"},{"key":"2024103010381648700_btae601-B38","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/978-1-4939-8561-6_2","article-title":"MetaVW: large-Scale machine learning for metagenomics sequence classification","volume":"1807","author":"Vervier","year":"2018","journal-title":"Methods Mol Biol"},{"key":"2024103010381648700_btae601-B39","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1186\/s13059-019-1891-0","article-title":"Improved metagenomic analysis with Kraken 2","volume":"20","author":"Wood","year":"2019","journal-title":"Genome Biol"},{"key":"2024103010381648700_btae601-B40","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2024103010381648700_btae601-B41","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1016\/j.cell.2019.07.010","article-title":"Benchmarking metagenomics tools for taxonomic classification","volume":"178","author":"Ye","year":"2019","journal-title":"Cell"},{"key":"2024103010381648700_btae601-B42","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1186\/s12859-020-03744-7","article-title":"Keeping up with the genomes: efficient learning of our increasing knowledge of the tree of life","volume":"21","author":"Zhao","year":"2020","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae601\/59713759\/btae601.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/10\/btae601\/60213690\/btae601.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/10\/btae601\/60213690\/btae601.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T15:14:49Z","timestamp":1730301289000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae601\/7817804"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,10,1]]},"references-count":42,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae601","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2024,10]]},"published":{"date-parts":[[2024,10,1]]},"article-number":"btae601"}}