{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T07:17:27Z","timestamp":1761895047587},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Molecular studies of microbial diversity have provided many insights into the bacterial communities inhabiting the human body and the environment. A common first step in such studies is a survey of conserved marker genes (primarily 16S rRNA) to characterize the taxonomic composition and diversity of these communities. To date, however, there exists significant variability in analysis methods employed in these studies.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Here we provide a critical assessment of current analysis methodologies that cluster sequences into operational taxonomic units (OTUs) and demonstrate that small changes in algorithm parameters can lead to significantly varying results. Our analysis provides strong evidence that the species-level diversity estimates produced using common OTU methodologies are inflated due to overly stringent parameter choices. We further describe an example of how semi-supervised clustering can produce OTUs that are more robust to changes in algorithm parameters.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Our results highlight the need for systematic and open evaluation of data analysis methodologies, especially as targeted 16S rRNA diversity studies are increasingly relying on high-throughput sequencing technologies. All data and results from our study are available through the JGI FAMeS website <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/fames.jgi-psf.org\/\" ext-link-type=\"uri\">http:\/\/fames.jgi-psf.org\/<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-11-152","type":"journal-article","created":{"date-parts":[[2010,3,24]],"date-time":"2010-03-24T19:18:04Z","timestamp":1269458284000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":62,"title":["Alignment and clustering of phylogenetic markers - implications for microbial diversity studies"],"prefix":"10.1186","volume":"11","author":[{"given":"James R","family":"White","sequence":"first","affiliation":[]},{"given":"Saket","family":"Navlakha","sequence":"additional","affiliation":[]},{"given":"Niranjan","family":"Nagarajan","sequence":"additional","affiliation":[]},{"given":"Mohammad-Reza","family":"Ghodsi","sequence":"additional","affiliation":[]},{"given":"Carl","family":"Kingsford","sequence":"additional","affiliation":[]},{"given":"Mihai","family":"Pop","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,3,24]]},"reference":[{"key":"3609_CR1","doi-asserted-by":"publisher","first-page":"1635","DOI":"10.1126\/science.1110591","volume":"308","author":"PB Eckburg","year":"2005","unstructured":"Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent M, Gill SR, Nelson KE, Relman DA: Diversity of the human intestinal microbial flora. Science 2005, 308: 1635\u20131638. 10.1126\/science.1110591","journal-title":"Science"},{"key":"3609_CR2","doi-asserted-by":"publisher","first-page":"e280","DOI":"10.1371\/journal.pbio.0060280","volume":"6","author":"L Dethlefsen","year":"2008","unstructured":"Dethlefsen L, Huse S, Sogin ML, Relman DA: The Pervasive Effects of an Antibiotic on the Human Gut Microbiota, as Revealed by Deep 16S rRNA Sequencing. PLoS Biol 2008, 6: e280. 10.1371\/journal.pbio.0060280","journal-title":"PLoS Biol"},{"key":"3609_CR3","doi-asserted-by":"publisher","first-page":"1043","DOI":"10.1101\/gr.075549.107","volume":"18","author":"EA Grice","year":"2008","unstructured":"Grice EA, Kong HH, Renaud G, Young AC, Bouffard GG, Blakesley RW, Wolfsberg TG, Turner ML, Segre JA: A diversity profile of the human skin microbiota. Genome Res 2008, 18: 1043\u20131050. 10.1101\/gr.075549.107","journal-title":"Genome Res"},{"key":"3609_CR4","doi-asserted-by":"publisher","first-page":"e1000255","DOI":"10.1371\/journal.pgen.1000255","volume":"4","author":"SM Huse","year":"2008","unstructured":"Huse SM, Dethlefsen L, Huber JA, Welch DM, Relman DA, Sogin ML: Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing. PLoS genetics 2008, 4: e1000255. 10.1371\/journal.pgen.1000255","journal-title":"PLoS genetics"},{"key":"3609_CR5","doi-asserted-by":"publisher","first-page":"480","DOI":"10.1038\/nature07540","volume":"457","author":"PJ Turnbaugh","year":"2009","unstructured":"Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, et al.: A core gut microbiome in obese and lean twins. Nature 2009, 457: 480\u2013484. 10.1038\/nature07540","journal-title":"Nature"},{"key":"3609_CR6","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1371\/journal.pcbi.0010024","volume":"1","author":"K Chen","year":"2005","unstructured":"Chen K, Pachter L: Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS computational biology 2005, 1: 106\u2013112. 10.1371\/journal.pcbi.0010024","journal-title":"PLoS computational biology"},{"key":"3609_CR7","doi-asserted-by":"publisher","first-page":"5261","DOI":"10.1128\/AEM.00062-07","volume":"73","author":"Q Wang","year":"2007","unstructured":"Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and environmental microbiology 2007, 73: 5261\u20135267. 10.1128\/AEM.00062-07","journal-title":"Applied and environmental microbiology"},{"key":"3609_CR8","volume-title":"Book PHYLIP - phylogeny inference package (Version 3.2)(Editor ed.\u00eads.)","author":"J Felsenstein","year":"1989","unstructured":"Felsenstein J: PHYLIP - phylogeny inference package (Version 3.2). In Book PHYLIP - phylogeny inference package (Version 3.2)(Editor ed.\u00eads.). Volume 5. 3.2nd edition. City: Cladistics; 1989.","edition":"3.2"},{"key":"3609_CR9","doi-asserted-by":"crossref","first-page":"4765","DOI":"10.1128\/JB.180.18.4765-4774.1998","volume":"180","author":"P Hugenholtz","year":"1998","unstructured":"Hugenholtz P, Goebel BM, Pace NR: Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J Bacteriol 1998, 180: 4765\u20134774.","journal-title":"J Bacteriol"},{"key":"3609_CR10","doi-asserted-by":"publisher","first-page":"654","DOI":"10.1046\/j.1462-2920.2002.00352.x","volume":"4","author":"M Sait","year":"2002","unstructured":"Sait M, Hugenholtz P, Janssen PH: Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. Environ Microbiol 2002, 4: 654\u2013666. 10.1046\/j.1462-2920.2002.00352.x","journal-title":"Environ Microbiol"},{"key":"3609_CR11","doi-asserted-by":"publisher","first-page":"e92","DOI":"10.1371\/journal.pcbi.0020092","volume":"2","author":"PD Schloss","year":"2006","unstructured":"Schloss PD, Handelsman J: Toward a census of bacteria in soil. PLoS computational biology 2006, 2: e92. 10.1371\/journal.pcbi.0020092","journal-title":"PLoS computational biology"},{"key":"3609_CR12","doi-asserted-by":"publisher","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","volume":"22","author":"W Li","year":"2006","unstructured":"Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22: 1658\u20131659. 10.1093\/bioinformatics\/btl158","journal-title":"Bioinformatics"},{"key":"3609_CR13","doi-asserted-by":"publisher","first-page":"12115","DOI":"10.1073\/pnas.0605127103","volume":"103","author":"ML Sogin","year":"2006","unstructured":"Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, Neal PR, Arrieta JM, Herndl GJ: Microbial diversity in the deep sea and the underexplored \"rare biosphere\". Proc Natl Acad Sci USA 2006, 103: 12115\u201312120. 10.1073\/pnas.0605127103","journal-title":"Proc Natl Acad Sci USA"},{"key":"3609_CR14","doi-asserted-by":"publisher","first-page":"495","DOI":"10.1038\/nmeth1043","volume":"4","author":"K Mavromatis","year":"2007","unstructured":"Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, et al.: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nature methods 2007, 4: 495\u2013500. 10.1038\/nmeth1043","journal-title":"Nature methods"},{"key":"3609_CR15","doi-asserted-by":"publisher","first-page":"873","DOI":"10.1016\/j.jmva.2006.11.013","volume":"98","author":"M Meila","year":"2007","unstructured":"Meila M: Comparing clusterings - an information based distance. J Multivariate Anal 2007, 98: 873\u2013895. 10.1016\/j.jmva.2006.11.013","journal-title":"J Multivariate Anal"},{"key":"3609_CR16","doi-asserted-by":"publisher","first-page":"D294","DOI":"10.1093\/nar\/gki038","volume":"33","author":"JR Cole","year":"2005","unstructured":"Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, Tiedje JM: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res 2005, 33: D294\u2013296. 10.1093\/nar\/gki038","journal-title":"Nucleic Acids Res"},{"key":"3609_CR17","doi-asserted-by":"publisher","first-page":"W394","DOI":"10.1093\/nar\/gkl244","volume":"34","author":"TZ DeSantis Jr","year":"2006","unstructured":"DeSantis TZ Jr, Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM, Phan R, Andersen GL: NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res 2006, 34: W394\u2013399. 10.1093\/nar\/gkl244","journal-title":"Nucleic Acids Res"},{"key":"3609_CR18","unstructured":"The Taxonomic Outline of Bacteria and Archaea[http:\/\/www.taxonomicoutline.org\/]"},{"key":"3609_CR19","doi-asserted-by":"publisher","first-page":"5069","DOI":"10.1128\/AEM.03006-05","volume":"72","author":"TZ DeSantis","year":"2006","unstructured":"DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Applied and environmental microbiology 2006, 72: 5069\u20135072. 10.1128\/AEM.03006-05","journal-title":"Applied and environmental microbiology"},{"key":"3609_CR20","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Res"},{"key":"3609_CR21","doi-asserted-by":"publisher","first-page":"1917","DOI":"10.1126\/science.1124696","volume":"312","author":"MR Lambais","year":"2006","unstructured":"Lambais MR, Crowley DE, Cury JC, Bull RC, Rodrigues RR: Bacterial diversity in tree canopies of the Atlantic forest. Science 2006, 312: 1917. 10.1126\/science.1124696","journal-title":"Science"},{"key":"3609_CR22","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1186\/1471-2105-5-113","volume":"5","author":"RC Edgar","year":"2004","unstructured":"Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 2004, 5: 113. 10.1186\/1471-2105-5-113","journal-title":"BMC Bioinformatics"},{"key":"3609_CR23","doi-asserted-by":"publisher","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","volume":"22","author":"JD Thompson","year":"1994","unstructured":"Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22: 4673\u20134680. 10.1093\/nar\/22.22.4673","journal-title":"Nucleic Acids Res"},{"key":"3609_CR24","doi-asserted-by":"publisher","first-page":"1363","DOI":"10.1093\/nar\/gkh293","volume":"32","author":"W Ludwig","year":"2004","unstructured":"Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar , Buchner A, Lai T, Steppi S, Jobb G, et al.: ARB: a software environment for sequence data. Nucleic Acids Res 2004, 32: 1363\u20131371. 10.1093\/nar\/gkh293","journal-title":"Nucleic Acids Res"},{"key":"3609_CR25","doi-asserted-by":"publisher","first-page":"1501","DOI":"10.1128\/AEM.71.3.1501-1506.2005","volume":"71","author":"PD Schloss","year":"2005","unstructured":"Schloss PD, Handelsman J: Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Applied and environmental microbiology 2005, 71: 1501\u20131506. 10.1128\/AEM.71.3.1501-1506.2005","journal-title":"Applied and environmental microbiology"},{"key":"3609_CR26","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1007\/978-3-642-02008-7_29","volume":"5541","author":"S Navlakha","year":"2009","unstructured":"Navlakha S, White JR, Nagarajan N, Pop M, Kingsford C: Finding Biologically Accurate Clusterings in Hierarchical Decompositions Using the Variation of Information. Lecture Notes in Computer Science: Research in Computational Molecular Biology 2009, 5541: 400\u2013417. full_text","journal-title":"Lecture Notes in Computer Science: Research in Computational Molecular Biology"},{"key":"3609_CR27","first-page":"265","volume":"11","author":"A Chao","year":"1984","unstructured":"Chao A: Non-parametric estimation of the number of classes in a population. Scand J Stat 1984, 11: 265\u2013270.","journal-title":"Scand J Stat"},{"key":"3609_CR28","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1080\/01621459.1992.10475194","volume":"87","author":"A Chao","year":"1992","unstructured":"Chao A, Lee SM: Estimating the Number of Classes Via Sample Coverage. J Am Stat Assoc 1992, 87: 210\u2013217. 10.2307\/2290471","journal-title":"J Am Stat Assoc"},{"key":"3609_CR29","first-page":"623","volume":"27","author":"CE Shannon","year":"1948","unstructured":"Shannon CE: A Mathematical Theory of Communication. At&T Tech J 1948, 27: 623\u2013656.","journal-title":"At&T Tech J"},{"key":"3609_CR30","doi-asserted-by":"publisher","first-page":"REVIEWS0003","DOI":"10.1186\/gb-2002-3-2-reviews0003","volume":"3","author":"P Hugenholtz","year":"2002","unstructured":"Hugenholtz P: Exploring prokaryotic diversity in the genomic era. Genome Biol 2002, 3: REVIEWS0003. 10.1186\/gb-2002-3-2-reviews0003","journal-title":"Genome Biol"},{"key":"3609_CR31","first-page":"115","volume-title":"Nucleic Acid Techniques in Bacterial Systematics","author":"DJ Lane","year":"1991","unstructured":"Lane DJ: 16S\/23S rRNA sequencing. In Nucleic Acid Techniques in Bacterial Systematics. New York: Wiley; 1991:115\u2013175."},{"key":"3609_CR32","doi-asserted-by":"publisher","first-page":"6ra14","DOI":"10.1126\/scitranslmed.3000322","volume":"1","author":"P Turnbaugh","year":"2009","unstructured":"Turnbaugh P, Ridaura V, Faith J, Rey FE, Knight R, Gordon J: The Effect of Diet on the Human Gut Microbiome: A Metagenomic Analysis in Humanized Gnotobiotic Mice. Sci Transl Med 2009, 1: 6ra14.","journal-title":"Sci Transl Med"},{"key":"3609_CR33","doi-asserted-by":"publisher","first-page":"1027","DOI":"10.1038\/nature05414","volume":"444","author":"PJ Turnbaugh","year":"2006","unstructured":"Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI: An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 2006, 444: 1027\u20131031. 10.1038\/nature05414","journal-title":"Nature"},{"key":"3609_CR34","doi-asserted-by":"publisher","first-page":"e1000352","DOI":"10.1371\/journal.pcbi.1000352","volume":"5","author":"JR White","year":"2009","unstructured":"White JR, Nagarajan N, Pop M: Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS computational biology 2009, 5: e1000352. 10.1371\/journal.pcbi.1000352","journal-title":"PLoS computational biology"},{"key":"3609_CR35","doi-asserted-by":"publisher","first-page":"804","DOI":"10.1038\/nature06244","volume":"449","author":"PJ Turnbaugh","year":"2007","unstructured":"Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI: The human microbiome project. Nature 2007, 449: 804\u2013810. 10.1038\/nature06244","journal-title":"Nature"},{"key":"3609_CR36","doi-asserted-by":"publisher","first-page":"3470","DOI":"10.1128\/AEM.02120-06","volume":"73","author":"V Corby-Harris","year":"2007","unstructured":"Corby-Harris V, Pontaroli AC, Shimkets LJ, Bennetzen JL, Habel KE, Promislow DE: Geographical distribution and diversity of bacteria associated with natural populations of Drosophila melanogaster. Applied and environmental microbiology 2007, 73: 3470\u20133479. 10.1128\/AEM.02120-06","journal-title":"Applied and environmental microbiology"},{"key":"3609_CR37","doi-asserted-by":"publisher","first-page":"1888","DOI":"10.1111\/j.1462-2920.2008.01614.x","volume":"10","author":"J Kennedy","year":"2008","unstructured":"Kennedy J, Codling CE, Jones BV, Dobson AD, Marchesi JR: Diversity of microbes associated with the marine sponge, Haliclona simulans, isolated from Irish waters and identification of polyketide synthase genes from the sponge metagenome. Environ Microbiol 2008, 10: 1888\u20131902. 10.1111\/j.1462-2920.2008.01614.x","journal-title":"Environ Microbiol"},{"key":"3609_CR38","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1126\/science.1146689","volume":"318","author":"JA Huber","year":"2007","unstructured":"Huber JA, Mark Welch DB, Morrison HG, Huse SM, Neal PR, Butterfield DA, Sogin ML: Microbial population structures in the deep marine biosphere. Science 2007, 318: 97\u2013100. 10.1126\/science.1146689","journal-title":"Science"},{"key":"3609_CR39","doi-asserted-by":"publisher","first-page":"560","DOI":"10.1038\/nature06269","volume":"450","author":"F Warnecke","year":"2007","unstructured":"Warnecke F, Luginbuhl P, Ivanova N, Ghassemian M, Richardson TH, Stege JT, Cayouette M, McHardy AC, Djordjevic G, Aboushadi N, et al.: Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature 2007, 450: 560\u2013565. 10.1038\/nature06269","journal-title":"Nature"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-152.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T05:19:27Z","timestamp":1630473567000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-11-152"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,3,24]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["3609"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-11-152","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,3,24]]},"assertion":[{"value":"5 December 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 March 2010","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 March 2010","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"152"}}