{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T02:10:33Z","timestamp":1774577433027,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2016,10,5]],"date-time":"2016-10-05T00:00:00Z","timestamp":1475625600000},"content-version":"vor","delay-in-days":2092,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: With the advancements of next-generation sequencing technology, it is now possible to study samples directly obtained from the environment. Particularly, 16S rRNA gene sequences have been frequently used to profile the diversity of organisms in a sample. However, such studies are still taxed to determine both the number of operational taxonomic units (OTUs) and their relative abundance in a sample.<\/jats:p>\n               <jats:p>Results: To address these challenges, we propose an unsupervised Bayesian clustering method termed Clustering 16S rRNA for OTU Prediction (CROP). CROP can find clusters based on the natural organization of data without setting a hard cut-off threshold (3%\/5%) as required by hierarchical clustering methods. By applying our method to several datasets, we demonstrate that CROP is robust against sequencing errors and that it produces more accurate results than conventional hierarchical clustering methods.<\/jats:p>\n               <jats:p>Availability and Implementation: Source code freely available at the following URL: http:\/\/code.google.com\/p\/crop-tingchenlab\/, implemented in C++ and supported on Linux and MS Windows.<\/jats:p>\n               <jats:p>Contact: \u00a0tingchen@usc.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq725","type":"journal-article","created":{"date-parts":[[2011,1,14]],"date-time":"2011-01-14T04:50:46Z","timestamp":1294980646000},"page":"611-618","source":"Crossref","is-referenced-by-count":221,"title":["Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering"],"prefix":"10.1093","volume":"27","author":[{"given":"Xiaolin","family":"Hao","sequence":"first","affiliation":[{"name":"1 Molecular and Computational Biology Program, Department of Biology, University of Southern California, University Park, Los Angeles, CA 90089, USA and 2MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST\/Department of Automation, Tsinghua University, Beijing, 100084, China"}]},{"given":"Rui","family":"Jiang","sequence":"additional","affiliation":[{"name":"1 Molecular and Computational Biology Program, Department of Biology, University of Southern California, University Park, Los Angeles, CA 90089, USA and 2MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST\/Department of Automation, Tsinghua University, Beijing, 100084, China"}]},{"given":"Ting","family":"Chen","sequence":"additional","affiliation":[{"name":"1 Molecular and Computational Biology Program, Department of Biology, University of Southern California, University Park, Los Angeles, CA 90089, USA and 2MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST\/Department of Automation, Tsinghua University, Beijing, 100084, China"}]}],"member":"286","published-online":{"date-parts":[[2011,1,13]]},"reference":[{"key":"2023012511574639900_B1","doi-asserted-by":"crossref","first-page":"1765","DOI":"10.1093\/bioinformatics\/btn244","article-title":"Efficient functional clustering of protein sequences using the Dirichlet process","volume":"24","author":"Brown","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012511574639900_B2","doi-asserted-by":"crossref","first-page":"D141","DOI":"10.1093\/nar\/gkn879","article-title":"The ribosomal database project: improved alignments and new tools for rRNA analysis","volume":"37","author":"Cole","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012511574639900_B3","doi-asserted-by":"crossref","first-page":"1694","DOI":"10.1126\/science.1177486","article-title":"Bacterial community variation in human body habitats across space and time","volume":"326","author":"Costello","year":"2009","journal-title":"Science"},{"key":"2023012511574639900_B4","doi-asserted-by":"crossref","first-page":"W394","DOI":"10.1093\/nar\/gkl244","article-title":"NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes","volume":"34","author":"DeSantis","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012511574639900_B5","doi-asserted-by":"crossref","first-page":"e82","DOI":"10.1371\/journal.pbio.0050082","article-title":"Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes","volume":"5","author":"Eisen","year":"2007","journal-title":"PLoS Biol."},{"key":"2023012511574639900_B6","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/nar\/30.7.1575","article-title":"An efficient algorithm for large-scale detection of protein families","volume":"30","author":"Enright","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023012511574639900_B7","doi-asserted-by":"crossref","first-page":"1190","DOI":"10.1126\/science.1171700","article-title":"Topographical and temporal diversity of the human skin microbiome","volume":"324","author":"Grice","year":"2009","journal-title":"Science"},{"key":"2023012511574639900_B8","doi-asserted-by":"crossref","first-page":"R143","DOI":"10.1186\/gb-2007-8-7-r143","article-title":"Accuracy and quality of massively parallel DNA pyrosequencing","volume":"8","author":"Huse","year":"2007","journal-title":"Genome Biol."},{"key":"2023012511574639900_B9","doi-asserted-by":"crossref","first-page":"1889","DOI":"10.1111\/j.1462-2920.2010.02193.x","article-title":"Ironing out the wrinkles in the rare biosphere through improved OTU clustering","volume":"12","author":"Huse","year":"2010","journal-title":"Environ. Microbiol."},{"key":"2023012511574639900_B10","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1007\/BF02289588","article-title":"Hierarchical clustering schemes","volume":"32","author":"Johnson","year":"1967","journal-title":"Psychometrika"},{"key":"2023012511574639900_B11","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1093\/nar\/gki198","article-title":"MAFFT version 5: improvement in accuracy of multiple sequence alignment","volume":"33","author":"Katoh","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012511574639900_B12","volume-title":"Metagenomics: Theory, Methods and Applications","author":"Marco","year":"2010"},{"key":"2023012511574639900_B13","doi-asserted-by":"crossref","first-page":"2466","DOI":"10.1093\/bioinformatics\/btl411","article-title":"Bayesian search of functionally divergent protein subgroups and their function specific residues","volume":"22","author":"Marttinen","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012511574639900_B14","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023012511574639900_B15","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1038\/nmeth.1361","article-title":"Accurate determination of microbial diversity from 454 pyrosequencing data","volume":"6","author":"Quince","year":"2009","journal-title":"Nat. Methods"},{"key":"2023012511574639900_B16","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1111\/1467-9868.00095","article-title":"On Bayesian analysis of mixtures with an unknown number of components","volume":"59","author":"Richardson","year":"1997","journal-title":"J.R. Stat.Soc. Ser. B (Methodol.)"},{"key":"2023012511574639900_B17","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1038\/nbt1485","article-title":"The development and impact of 454 sequencing","volume":"26","author":"Rothberg","year":"2008","journal-title":"Nat. Biotechnol."},{"key":"2023012511574639900_B18","doi-asserted-by":"crossref","first-page":"e1000844","DOI":"10.1371\/journal.pcbi.1000844","article-title":"The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies","volume":"6","author":"Schloss","year":"2010","journal-title":"PLoS Comput. Biol."},{"key":"2023012511574639900_B19","doi-asserted-by":"crossref","first-page":"1501","DOI":"10.1128\/AEM.71.3.1501-1506.2005","article-title":"Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness","volume":"71","author":"Schloss","year":"2005","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012511574639900_B20","doi-asserted-by":"crossref","first-page":"7537","DOI":"10.1128\/AEM.01541-09","article-title":"Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities","volume":"75","author":"Schloss","year":"2009","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012511574639900_B21","doi-asserted-by":"crossref","first-page":"12115","DOI":"10.1073\/pnas.0605127103","article-title":"Microbial diversity in the deep sea and the underexplored \u201crare biosphere\u201d","volume":"103","author":"Sogin","year":"2006","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511574639900_B22","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1214\/aos\/1016120364","article-title":"Bayesian analysis of mixture models with an unknown number of components - an alternative to reversible jump methods","volume":"28","author":"Stephens","year":"2000","journal-title":"Ann. Stat."},{"key":"2023012511574639900_B23","doi-asserted-by":"crossref","first-page":"e76","DOI":"10.1093\/nar\/gkp285","article-title":"ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences","volume":"37","author":"Sun","year":"2009","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/5\/611\/48867989\/bioinformatics_27_5_611.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/5\/611\/48867989\/bioinformatics_27_5_611.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T12:46:56Z","timestamp":1674650816000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/5\/611\/1745910"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,1,13]]},"references-count":23,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2011,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq725","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,3,1]]},"published":{"date-parts":[[2011,1,13]]}}}