{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:24Z","timestamp":1772138064522,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2024,3,16]],"date-time":"2024-03-16T00:00:00Z","timestamp":1710547200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Institute of Health","award":["R35GM142725"],"award-info":[{"award-number":["R35GM142725"]}]},{"DOI":"10.13039\/501100016056","name":"Minderoo Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100016056","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,3,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Taxonomic classification of short reads and taxonomic profiling of metagenomic samples are well-studied yet challenging problems. The presence of species belonging to groups without close representation in a reference dataset is particularly challenging. While k-mer-based methods have performed well in terms of running time and accuracy, they tend to have reduced accuracy for such novel species. Thus, there is a growing need for methods that combine the scalability of k-mers with increased sensitivity.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we show that using locality-sensitive hashing (LSH) can increase the sensitivity of the k-mer-based search. Our method, which combines LSH with several heuristics techniques including soft lowest common ancestor labeling and voting, is more accurate than alternatives in both taxonomic classification of individual reads and abundance profiling.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>CONSULT-II is implemented in C++, and the software, together with reference libraries, is publicly available on GitHub https:\/\/github.com\/bo1929\/CONSULT-II.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae150","type":"journal-article","created":{"date-parts":[[2024,3,14]],"date-time":"2024-03-14T18:26:58Z","timestamp":1710440818000},"source":"Crossref","is-referenced-by-count":11,"title":["CONSULT-II: accurate taxonomic identification and profiling using locality-sensitive hashing"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4396-817X","authenticated-orcid":false,"given":"Ali Osman Berk","family":"\u015eapc\u0131","sequence":"first","affiliation":[{"name":"Bioinformatics and Systems Biology Graduate Program, University of California , San Diego, CA 92093, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6104-5750","authenticated-orcid":false,"given":"Eleonora","family":"Rachtman","sequence":"additional","affiliation":[{"name":"Bioinformatics and Systems Biology Graduate Program, University of California , San Diego, CA 92093, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5410-1518","authenticated-orcid":false,"given":"Siavash","family":"Mirarab","sequence":"additional","affiliation":[{"name":"Bioinformatics and Systems Biology Graduate Program, University of California , San Diego, CA 92093, United States"},{"name":"Department of Electrical and Computer Engineering, University of California , San Diego, CA 92093, United States"}]}],"member":"286","published-online":{"date-parts":[[2024,3,16]]},"reference":[{"key":"2024040210052713400_btae150-B1","doi-asserted-by":"crossref","first-page":"2253","DOI":"10.1093\/bioinformatics\/btt389","article-title":"Scalable metagenomic taxonomy classification using a reference genome database","volume":"29","author":"Ames","year":"2013","journal-title":"Bioinformatics"},{"key":"2024040210052713400_btae150-B2","doi-asserted-by":"crossref","first-page":"2500","DOI":"10.1038\/s41467-020-16366-7","article-title":"Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0","volume":"11","author":"Asnicar","year":"2020","journal-title":"Nat Commun"},{"key":"2024040210052713400_btae150-B3","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1038\/nbt.3238","article-title":"Assembling large genomes with single-molecule sequencing and locality-sensitive hashing","volume":"33","author":"Berlin","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2024040210052713400_btae150-B4","author":"Blanke","year":"2020"},{"key":"2024040210052713400_btae150-B5","first-page":"310","author":"Brown","year":"2013"},{"key":"2024040210052713400_btae150-B6","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1093\/bioinformatics\/17.5.419","article-title":"Efficient large-scale sequence comparison by locality-sensitive hashing","volume":"17","author":"Buhler","year":"2001","journal-title":"Bioinformatics"},{"key":"2024040210052713400_btae150-B7","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1038\/ismej.2016.168","article-title":"Strategies to improve reference databases for soil microbiomes","volume":"11","author":"Choi","year":"2017","journal-title":"ISME J"},{"key":"2024040210052713400_btae150-B8","doi-asserted-by":"crossref","first-page":"669","DOI":"10.1128\/MMBR.68.4.669-685.2004","article-title":"Metagenomics: application of genomics to uncultured microorganisms","volume":"68","author":"Handelsman","year":"2004","journal-title":"Microbiol Mol Biol Rev"},{"key":"2024040210052713400_btae150-B9","doi-asserted-by":"crossref","first-page":"321","DOI":"10.4086\/toc.2012.v008a014","article-title":"Approximate nearest neighbors: towards removing the curse of dimensionality","volume":"8","author":"Har-Peled","year":"2012","journal-title":"Theory of Comput"},{"key":"2024040210052713400_btae150-B10","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: a next-generation sequencing read simulator","volume":"28","author":"Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"2024040210052713400_btae150-B11","doi-asserted-by":"crossref","first-page":"638","DOI":"10.1186\/s12859-019-3205-7","article-title":"Read-SpaM: assembly-free and alignment-free comparison of bacterial genomes with low sequencing coverage","volume":"20","author":"Lau","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2024040210052713400_btae150-B12","doi-asserted-by":"crossref","first-page":"lqaa009","DOI":"10.1093\/nargab\/lqaa009","article-title":"DeepMicrobes: taxonomic classification for metagenomics with deep learning","volume":"2","author":"Liang","year":"2020","journal-title":"NAR Genom Bioinform"},{"key":"2024040210052713400_btae150-B13","first-page":"95","author":"Liu","year":"2011"},{"key":"2024040210052713400_btae150-B14","doi-asserted-by":"crossref","first-page":"5970","DOI":"10.1073\/pnas.1521291113","article-title":"Scaling laws predict global microbial diversity","volume":"113","author":"Locey","year":"2016","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024040210052713400_btae150-B15","doi-asserted-by":"crossref","first-page":"8228","DOI":"10.1128\/AEM.71.12.8228-8235.2005","article-title":"UniFrac: a new phylogenetic method for comparing microbial communities","volume":"71","author":"Lozupone","year":"2005","journal-title":"Appl Environ Microbiol"},{"key":"2024040210052713400_btae150-B16","doi-asserted-by":"crossref","first-page":"e104","DOI":"10.7717\/peerj-cs.104","article-title":"Bracken: estimating species abundance in metagenomics data","volume":"3","author":"Lu","year":"2017","journal-title":"PeerJ Computer Sci"},{"key":"2024040210052713400_btae150-B17","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1093\/bioinformatics\/bty611","article-title":"Metagenomic binning through low-density hashing","volume":"35","author":"Luo","year":"2019","journal-title":"Bioinformatics"},{"key":"2024040210052713400_btae150-B18","article-title":"Greengenes2 unifies microbial data in a single reference tree","author":"McDonald","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2024040210052713400_btae150-B19","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1186\/s13059-017-1299-7","article-title":"Comprehensive benchmarking and ensemble approaches for metagenomic classifiers","volume":"18","author":"McIntyre","year":"2017","journal-title":"Genome Biol"},{"key":"2024040210052713400_btae150-B20","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1186\/s13059-019-1646-y","article-title":"Assessing taxonomic metagenome profilers with OPAL","volume":"20","author":"Meyer","year":"2019","journal-title":"Genome Biol"},{"key":"2024040210052713400_btae150-B21","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1038\/s41592-022-01431-4","article-title":"Critical assessment of metagenome interpretation: the second round of challenges","volume":"19","author":"Meyer","year":"2022","journal-title":"Nat Methods"},{"key":"2024040210052713400_btae150-B22","doi-asserted-by":"crossref","first-page":"1014","DOI":"10.1038\/s41467-019-08844-4","article-title":"Microbial abundance, activity and population genomic profiling with mOTUs2","volume":"10","author":"Milanese","year":"2019","journal-title":"Nat Commun"},{"key":"2024040210052713400_btae150-B23","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1186\/s13059-018-1554-6","article-title":"RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification","volume":"19","author":"Nasko","year":"2018","journal-title":"Genome Biol"},{"key":"2024040210052713400_btae150-B24","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2024040210052713400_btae150-B25","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1186\/s12864-015-1419-2","article-title":"CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers","volume":"16","author":"Ounit","year":"2015","journal-title":"BMC Genomics"},{"key":"2024040210052713400_btae150-B26","doi-asserted-by":"crossref","first-page":"1623","DOI":"10.1016\/j.cell.2019.11.017","article-title":"Charting the complexity of the marine microbiome through single-cell genomics","volume":"179","author":"Pachiadaki","year":"2019","journal-title":"Cell"},{"key":"2024040210052713400_btae150-B27","doi-asserted-by":"crossref","first-page":"1079","DOI":"10.1038\/s41587-020-0501-8","article-title":"A complete domain-to-species taxonomy for Bacteria and Archaea","volume":"38","author":"Parks","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2024040210052713400_btae150-B28","doi-asserted-by":"crossref","first-page":"1755","DOI":"10.1111\/1755-0998.13135","article-title":"The impact of contaminants on the accuracy of genome skimming and the effectiveness of exclusion read filters","volume":"20","author":"Rachtman","year":"2020","journal-title":"Mol Ecol Resour"},{"key":"2024040210052713400_btae150-B29","doi-asserted-by":"crossref","first-page":"lqab071","DOI":"10.1093\/nargab\/lqab071","article-title":"CONSULT: accurate contamination removal using locality-sensitive hashing","volume":"3","author":"Rachtman","year":"2021","journal-title":"NAR Genom Bioinform"},{"key":"2024040210052713400_btae150-B30","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1186\/1752-0509-7-S4-S11","article-title":"16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing","volume":"7","author":"Rasheed","year":"2013","journal-title":"BMC Syst Biol"},{"key":"2024040210052713400_btae150-B31","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1007\/978-3-031-36911-7_13","volume-title":"Comparative Genomics","author":"\u015eapc\u0131","year":"2023"},{"key":"2024040210052713400_btae150-B32","doi-asserted-by":"crossref","first-page":"1063","DOI":"10.1038\/nmeth.4458","article-title":"Critical assessment of metagenome interpretation\u2014a benchmark of metagenomics software","volume":"14","author":"Sczyrba","year":"2017","journal-title":"Nat Methods"},{"key":"2024040210052713400_btae150-B33","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1038\/nmeth.2066","article-title":"Metagenomic microbial community profiling using unique clade-specific marker genes","volume":"9","author":"Segata","year":"2012","journal-title":"Nat Methods"},{"key":"2024040210052713400_btae150-B34","doi-asserted-by":"crossref","first-page":"1839","DOI":"10.1093\/bioinformatics\/btab023","article-title":"TIPP2: metagenomic taxonomic profiling using phylogenetic markers","volume":"37","author":"Shah","year":"2021","journal-title":"Bioinformatics"},{"key":"2024040210052713400_btae150-B35","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/nmeth.2693","article-title":"Metagenomic species profiling using universal phylogenetic marker genes","volume":"10","author":"Sunagawa","year":"2013","journal-title":"Nat Methods"},{"key":"2024040210052713400_btae150-B36","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1038\/nmeth.3589","article-title":"MetaPhlAn2 for enhanced metagenomic taxonomic profiling","volume":"12","author":"Truong","year":"2015","journal-title":"Nat Methods"},{"key":"2024040210052713400_btae150-B37","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1186\/s13059-019-1817-x","article-title":"Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT","volume":"20","author":"von Meijenfeldt","year":"2019","journal-title":"Genome Biol"},{"key":"2024040210052713400_btae150-B38","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1186\/s13059-019-1891-0","article-title":"Improved metagenomic analysis with Kraken 2","volume":"20","author":"Wood","year":"2019","journal-title":"Genome Biol"},{"key":"2024040210052713400_btae150-B39","doi-asserted-by":"crossref","first-page":"D545","DOI":"10.1093\/nar\/gkz764","article-title":"GMrepo: a database of curated and consistently annotated human gut metagenomes","volume":"48","author":"Wu","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2024040210052713400_btae150-B40","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1016\/j.cell.2019.07.010","article-title":"Benchmarking metagenomics tools for taxonomic classification","volume":"178","author":"Ye","year":"2019","journal-title":"Cell"},{"key":"2024040210052713400_btae150-B41","doi-asserted-by":"crossref","first-page":"5477","DOI":"10.1038\/s41467-019-13443-4","article-title":"Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea","volume":"10","author":"Zhu","year":"2019","journal-title":"Nat Commun"},{"key":"2024040210052713400_btae150-B43","doi-asserted-by":"crossref","first-page":"e0016722","DOI":"10.1128\/msystems.00167-22","article-title":"Phylogeny-aware analysis of metagenome community ecology based on matched reference genomes while bypassing taxonomy","volume":"7","author":"Zhu","year":"2022","journal-title":"mSystems"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae150\/56995286\/btae150.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/4\/btae150\/57137056\/btae150.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/4\/btae150\/57137056\/btae150.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,2]],"date-time":"2024-04-02T09:42:56Z","timestamp":1712050976000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae150\/7630488"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,3,16]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,3,29]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae150","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.11.07.566115","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,4,1]]},"published":{"date-parts":[[2024,3,16]]},"article-number":"btae150"}}