{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T18:46:43Z","timestamp":1776365203963,"version":"3.51.2"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2018,2,28]],"date-time":"2018-02-28T00:00:00Z","timestamp":1519776000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The 16S ribosomal RNA (rRNA) gene is widely used to survey microbial communities. Sequences are often clustered into Operational Taxonomic Units (OTUs) as proxies for species. The canonical clustering threshold is 97% identity, which was proposed in 1994 when few 16S rRNA sequences were available, motivating a reassessment on current data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Using a large set of high-quality 16S rRNA sequences from finished genomes, I assessed the correspondence of OTUs to species for five representative clustering algorithms using four accuracy metrics. All algorithms had comparable accuracy when tuned to a given metric. Optimal identity thresholds were \u223c99% for full-length sequences and \u223c100% for the V4 hypervariable region.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Reference sequences and source code are provided in the Supplementary Material.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty113","type":"journal-article","created":{"date-parts":[[2018,2,27]],"date-time":"2018-02-27T07:17:36Z","timestamp":1519715856000},"page":"2371-2375","source":"Crossref","is-referenced-by-count":607,"title":["Updating the 97% identity threshold for 16S ribosomal RNA OTUs"],"prefix":"10.1093","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7355-2541","authenticated-orcid":false,"given":"Robert C","family":"Edgar","sequence":"first","affiliation":[{"name":"Sonoma, CA, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,2,28]]},"reference":[{"key":"2023012712584680100_bty113-B1","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1093\/bioinformatics\/16.5.412","article-title":"Assessing the accuracy of prediction algorithms for classification: an overview","volume":"16","author":"Baldi","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012712584680100_bty113-B2","doi-asserted-by":"crossref","first-page":"D36","DOI":"10.1093\/nar\/gks1195","article-title":"GenBank","volume":"41","author":"Benson","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023012712584680100_bty113-B3","doi-asserted-by":"crossref","first-page":"e95.","DOI":"10.1093\/nar\/gkr349","article-title":"ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time","volume":"39","author":"Cai","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023012712584680100_bty113-B4","doi-asserted-by":"crossref","DOI":"10.1038\/ismej.2017.119","article-title":"Exact sequence variants should replace operational taxonomic units in marker-gene data analysis","author":"Callahan","year":"2017","journal-title":"ISME J"},{"key":"2023012712584680100_bty113-B5","doi-asserted-by":"crossref","first-page":"581.","DOI":"10.1038\/nmeth.3869","article-title":"DADA2: high-resolution sample inference from Illumina amplicon data","volume":"13","author":"Callahan","year":"2016","journal-title":"Nat. Methods"},{"key":"2023012712584680100_bty113-B6","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1038\/nrg3182","article-title":"The human microbiome: at the interface of health and disease","volume":"13","author":"Cho","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2023012712584680100_bty113-B7","doi-asserted-by":"crossref","DOI":"10.1002\/0471200611","volume-title":"Elements of Information Theory","author":"Cover","year":"1991"},{"key":"2023012712584680100_bty113-B8","doi-asserted-by":"crossref","first-page":"W394","DOI":"10.1093\/nar\/gkl244","article-title":"NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes","volume":"34","author":"DeSantis","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023012712584680100_bty113-B9","doi-asserted-by":"crossref","first-page":"5069","DOI":"10.1128\/AEM.03006-05","article-title":"Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB","volume":"72","author":"DeSantis","year":"2006","journal-title":"Appl. Environ. Microbiol"},{"key":"2023012712584680100_bty113-B10","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1099\/0022-1317-49-5-397","article-title":"Strain, clone and species: comments on three basic concepts of bacteriology","volume":"49","author":"Dijkshoorn","year":"2000","journal-title":"J. Med. Microbiol"},{"key":"2023012712584680100_bty113-B11","doi-asserted-by":"crossref","first-page":"116.","DOI":"10.1186\/gb-2006-7-9-116","article-title":"Genomics and the bacterial species problem","volume":"7","author":"Doolittle","year":"2006","journal-title":"Genome Biol"},{"key":"2023012712584680100_bty113-B12","author":"Edgar","year":"2017"},{"key":"2023012712584680100_bty113-B13","author":"Edgar","year":"2017"},{"key":"2023012712584680100_bty113-B14","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1038\/nmeth.2604","article-title":"UPARSE: highly accurate OTU sequences from microbial amplicon reads","volume":"10","author":"Edgar","year":"2013","journal-title":"Nat. Methods"},{"key":"2023012712584680100_bty113-B15","doi-asserted-by":"crossref","first-page":"3476","DOI":"10.1093\/bioinformatics\/btv401","article-title":"Error filtering, pair assembly and error correction for next-generation sequencing reads","volume":"31","author":"Edgar","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012712584680100_bty113-B16","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ismej.2013.141","article-title":"Resistance and resilience of the forest soil microbiome to logging-associated compaction","volume":"8","author":"Hartmann","year":"2014","journal-title":"ISME J"},{"key":"2023012712584680100_bty113-B17","doi-asserted-by":"crossref","first-page":"1889","DOI":"10.1111\/j.1462-2920.2010.02193.x","article-title":"Ironing out the wrinkles in the rare biosphere through improved OTU clustering","volume":"12","author":"Huse","year":"2010","journal-title":"Environ. Microbiol"},{"key":"2023012712584680100_bty113-B18","doi-asserted-by":"crossref","first-page":"5112","DOI":"10.1128\/AEM.01043-13","article-title":"Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the miseq illumina sequencing platform","volume":"79","author":"Kozich","year":"2013","journal-title":"Appl. Environ. Microbiol"},{"key":"2023012712584680100_bty113-B19","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1093\/nar\/29.1.173","article-title":"The RDP-II (Ribosomal Database Project)","volume":"29","author":"Maidak","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023012712584680100_bty113-B20","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the predicted and observed secondary structure of T4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"BBA \u2013 Protein Struct"},{"key":"2023012712584680100_bty113-B21","doi-asserted-by":"crossref","first-page":"610","DOI":"10.1038\/ismej.2011.139","article-title":"An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea","volume":"6","author":"McDonald","year":"2012","journal-title":"ISME J"},{"key":"2023012712584680100_bty113-B22","doi-asserted-by":"crossref","first-page":"aac8455.","DOI":"10.1126\/science.aac8455","article-title":"The global ocean microbiome","volume":"350","author":"Moran","year":"2015","journal-title":"Science"},{"key":"2023012712584680100_bty113-B23","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol"},{"key":"2023012712584680100_bty113-B24","doi-asserted-by":"crossref","first-page":"1096","DOI":"10.1126\/science.1058543","article-title":"Genes lost and genes found: evolution of bacterial pathogenesis and symbiosis","volume":"292","author":"Ochman","year":"2001","journal-title":"Science"},{"key":"2023012712584680100_bty113-B25","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1146\/annurev-pathol-011811-132421","article-title":"Human microbiome in health and disease","volume":"7","author":"Pflughoeft","year":"2012","journal-title":"Annu. Rev. Pathol"},{"key":"2023012712584680100_bty113-B26","doi-asserted-by":"crossref","first-page":"7188","DOI":"10.1093\/nar\/gkm864","article-title":"SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB","volume":"35","author":"Pruesse","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023012712584680100_bty113-B27","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1038\/nmeth.1361","article-title":"Accurate determination of microbial diversity from 454 pyrosequencing data","volume":"6","author":"Quince","year":"2009","journal-title":"Nat. Methods"},{"key":"2023012712584680100_bty113-B28","doi-asserted-by":"crossref","first-page":"e545.","DOI":"10.7717\/peerj.545","article-title":"Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences","volume":"2","author":"Rideout","year":"2014","journal-title":"PeerJ"},{"key":"2023012712584680100_bty113-B29","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1038\/ismej.2008.5","article-title":"Evaluating different approaches that test whether microbial communities have the same structure","volume":"2","author":"Schloss","year":"2008","journal-title":"ISME J"},{"key":"2023012712584680100_bty113-B30","doi-asserted-by":"crossref","first-page":"e1000844.","DOI":"10.1371\/journal.pcbi.1000844","article-title":"The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies","volume":"6","author":"Schloss","year":"2010","journal-title":"PLoS Comput. Biol"},{"key":"2023012712584680100_bty113-B31","first-page":"1501","article-title":"Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness","volume-title":"Applied and Environmental Microbiology","author":"Schloss","year":"2005"},{"key":"2023012712584680100_bty113-B32","doi-asserted-by":"crossref","first-page":"3219","DOI":"10.1128\/AEM.02810-10","article-title":"Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis","volume":"77","author":"Schloss","year":"2011","journal-title":"Appl. Environ. Microbiol"},{"key":"2023012712584680100_bty113-B33","doi-asserted-by":"crossref","first-page":"7537","DOI":"10.1128\/AEM.01541-09","article-title":"Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities","volume":"75","author":"Schloss","year":"2009","journal-title":"Appl. Environ. Microbiol"},{"key":"2023012712584680100_bty113-B34","doi-asserted-by":"crossref","first-page":"9.","DOI":"10.1186\/1471-2105-2-9","article-title":"FastGroup: a program to dereplicate libraries of 16S rDNA sequences","volume":"2","author":"Seguritan","year":"2001","journal-title":"BMC Bioinformatics"},{"key":"2023012712584680100_bty113-B35","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1099\/00207713-44-4-846","article-title":"Taxonomic note: a place for DNA\u2013DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology","volume":"44","author":"Stackebrandt","year":"1994","journal-title":"Int. J. Syst. Evol. Microbiol"},{"key":"2023012712584680100_bty113-B36","doi-asserted-by":"crossref","first-page":"e76.","DOI":"10.1093\/nar\/gkp285","article-title":"ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences","volume":"37","author":"Sun","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023012712584680100_bty113-B37","article-title":"Multiple sequence alignment using ClustalW and ClustalX","author":"Thompson","year":"2002","journal-title":"Curr. Protoc. Bioinf"},{"key":"2023012712584680100_bty113-B38","doi-asserted-by":"crossref","first-page":"5261","DOI":"10.1128\/AEM.00062-07","article-title":"Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy","volume":"73","author":"Wang","year":"2007","journal-title":"Appl. Environ. Microbiol"},{"key":"2023012712584680100_bty113-B39","doi-asserted-by":"crossref","first-page":"e1487.","DOI":"10.7717\/peerj.1487","article-title":"De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units","volume":"3","author":"Westcott","year":"2015","journal-title":"PeerJ"},{"key":"2023012712584680100_bty113-B40","doi-asserted-by":"crossref","first-page":"e00073","DOI":"10.1128\/mSphereDirect.00073-17","article-title":"OptiClust, an improved method for assigning amplicon-based sequence data to operational taxonomic units","volume":"2","author":"Westcott","year":"2017","journal-title":"mSphere"},{"key":"2023012712584680100_bty113-B41","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1038\/nrmicro3330","article-title":"Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences","volume":"12","author":"Yarza","year":"2014","journal-title":"Nat. Rev. Microbiol"},{"key":"2023012712584680100_bty113-B42","first-page":"153","article-title":"Identification and quantification of abundant species from pyrosequences of 16S rRNA by consensus alignment","volume":"2010","author":"Ye","year":"2011","journal-title":"Proceedings (IEEE Int. Conf. Bioinf. Biomed.)"},{"key":"2023012712584680100_bty113-B43","doi-asserted-by":"crossref","first-page":"D643","DOI":"10.1093\/nar\/gkt1209","article-title":"The SILVA and \u2018all-species Living Tree Project (LTP)\u2019 taxonomic frameworks","volume":"42","author":"Yilmaz","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023012712584680100_bty113-B44","doi-asserted-by":"crossref","first-page":"2182","DOI":"10.1093\/bioinformatics\/bts355","article-title":"DySC: software for greedy clustering of 16S rRNA reads","volume":"28","author":"Zheng","year":"2012","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/14\/2371\/48917724\/bioinformatics_34_14_2371.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/14\/2371\/48917724\/bioinformatics_34_14_2371.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T08:47:26Z","timestamp":1674809246000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/14\/2371\/4913809"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,2,28]]},"references-count":44,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2018,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty113","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/192211","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,7,15]]},"published":{"date-parts":[[2018,2,28]]}}}