{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T12:33:15Z","timestamp":1767961995649,"version":"3.49.0"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Binning 16S rRNA sequences into operational taxonomic units (OTUs) is an initial crucial step in analyzing large sequence datasets generated to determine microbial community compositions in various environments including that of the human gut. Various methods have been developed, but most suffer from either inaccuracies or from being unable to handle millions of sequences generated in current studies. Furthermore, existing binning methods usually require <jats:italic>a priori<\/jats:italic> decisions regarding binning parameters such as a distance level for defining an OTU.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We present a novel modularity-based approach (M-pick) to address the aforementioned problems. The new method utilizes ideas from community detection in graphs, where sequences are viewed as vertices on a weighted graph, each pair of sequences is connected by an imaginary edge, and the similarity of a pair of sequences represents the weight of the edge. M-pick first generates a graph based on pairwise sequence distances and then applies a modularity-based community detection technique on the graph to generate OTUs to capture the community structures in sequence data. To compare the performance of M-pick with that of existing methods, specifically CROP and ESPRIT-Tree, sequence data from different hypervariable regions of 16S rRNA were used and binning results were compared.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>A new modularity-based clustering method for OTU picking of 16S rRNA sequences is developed in this study. The algorithm does not require a predetermined cut-off level, and our simulation studies suggest that it is superior to existing methods that require specified distance levels to define OTUs. The source code is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/plaza.ufl.edu\/xywang\/Mpick.htm\" ext-link-type=\"uri\">http:\/\/plaza.ufl.edu\/xywang\/Mpick.htm<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-14-43","type":"journal-article","created":{"date-parts":[[2013,2,7]],"date-time":"2013-02-07T03:14:00Z","timestamp":1360206840000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":32,"title":["M-pick, a modularity-based method for OTU picking of 16S rRNA sequences"],"prefix":"10.1186","volume":"14","author":[{"given":"Xiaoyu","family":"Wang","sequence":"first","affiliation":[]},{"given":"Jin","family":"Yao","sequence":"additional","affiliation":[]},{"given":"Yijun","family":"Sun","sequence":"additional","affiliation":[]},{"given":"Volker","family":"Mai","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2013,2,7]]},"reference":[{"key":"5729_CR1","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1093\/bib\/bbr009","volume":"13","author":"Y Sun","year":"2011","unstructured":"Sun Y, Cai Y, Huse SM, Knight R, Farmerie WG, Wang X: A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. Brief Bioinform 2011, 13: 107-121.","journal-title":"Brief Bioinform"},{"key":"5729_CR2","doi-asserted-by":"publisher","first-page":"3219","DOI":"10.1128\/AEM.02810-10","volume":"77","author":"PD Schloss","year":"2011","unstructured":"Schloss PD, Westcott SL: Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol 2011, 77: 3219-3226. 10.1128\/AEM.02810-10","journal-title":"Appl Environ Microbiol"},{"key":"5729_CR3","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1093\/nar\/gki038","volume":"33","author":"JR Cole","year":"2005","unstructured":"Cole JR, Chai B, Farris BJ, Wang Q, Kulam SA, McGarrell DM: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res 2005, 33: 294-296.","journal-title":"Nucleic Acids Res"},{"key":"5729_CR4","doi-asserted-by":"publisher","first-page":"5069","DOI":"10.1128\/AEM.03006-05","volume":"72","author":"TZ Desantis","year":"2006","unstructured":"Desantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL: Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 2006, 72: 5069-72. 10.1128\/AEM.03006-05","journal-title":"Appl Environ Microbiol"},{"key":"5729_CR5","doi-asserted-by":"publisher","first-page":"e1000255","DOI":"10.1371\/journal.pgen.1000255","volume":"4","author":"SM Huse","year":"2008","unstructured":"Huse SM, Dethlefsen L, Huber JA, Welch DM, Relman DA, Sogin ML: Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. PLoS Genet 2008, 4: e1000255. 10.1371\/journal.pgen.1000255","journal-title":"PLoS Genet"},{"key":"5729_CR6","doi-asserted-by":"publisher","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","volume":"22","author":"W Li","year":"2006","unstructured":"Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006, 22: 1658-1659. 10.1093\/bioinformatics\/btl158","journal-title":"Bioinformatics"},{"issue":"10","key":"5729_CR7","doi-asserted-by":"publisher","first-page":"e76","DOI":"10.1093\/nar\/gkp285","volume":"37","author":"Y Sun","year":"2009","unstructured":"Sun Y, Cai Y, Liu L, Yu F, Farrell ML, McKendree W, Farmerie W: ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences. Nucleic Acids Res 2009,37(10):e76. 10.1093\/nar\/gkp285","journal-title":"Nucleic Acids Res"},{"key":"5729_CR8","doi-asserted-by":"publisher","first-page":"1501","DOI":"10.1128\/AEM.71.3.1501-1506.2005","volume":"71","author":"PD Schloss","year":"2005","unstructured":"Schloss PD, Handelsman J: Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol 2005, 71: 1501-1506. 10.1128\/AEM.71.3.1501-1506.2005","journal-title":"Appl Environ Microbiol"},{"issue":"23","key":"5729_CR9","doi-asserted-by":"publisher","first-page":"7537","DOI":"10.1128\/AEM.01541-09","volume":"75","author":"PD Schloss","year":"2009","unstructured":"Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009,75(23):7537-7541. 10.1128\/AEM.01541-09","journal-title":"Appl Environ Microbiol"},{"key":"5729_CR10","doi-asserted-by":"publisher","first-page":"e95","DOI":"10.1093\/nar\/gkr349","volume":"39","author":"Y Cai","year":"2011","unstructured":"Cai Y, Sun Y: ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res 2011, 39: e95. 10.1093\/nar\/gkr349","journal-title":"Nucleic Acids Res"},{"issue":"19","key":"5729_CR11","doi-asserted-by":"publisher","first-page":"2460","DOI":"10.1093\/bioinformatics\/btq461","volume":"26","author":"RC Edgar","year":"2010","unstructured":"Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010,26(19):2460-2461. 10.1093\/bioinformatics\/btq461","journal-title":"Bioinformatics"},{"key":"5729_CR12","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1186\/1471-2105-11-152","volume":"11","author":"JR White","year":"2010","unstructured":"White JR, Navlakha S, Nagarajan N, Ghodsi M, Kingsfor C, Pop M: Alignment and clustering of phylogenetic markers - implications for microbial diversity studies. BMC Bioinformatics 2010, 11: 152. 10.1186\/1471-2105-11-152","journal-title":"BMC Bioinformatics"},{"key":"5729_CR13","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1093\/bioinformatics\/btq725","volume":"27","author":"X Hao","year":"2011","unstructured":"Hao X, Jiang R, Chen T: Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics 2011, 27: 611-618. 10.1093\/bioinformatics\/btq725","journal-title":"Bioinformatics"},{"key":"5729_CR14","volume-title":"Nucleic Acids Res","author":"L Cheng","year":"2012","unstructured":"Cheng L, Walke AW, Corander J: Bayesian estimation of bacterial community composition from 454 sequencing data. Nucleic Acids Res 2012. 10.1093\/nar\/gks227"},{"key":"5729_CR15","first-page":"056131","volume":"70","author":"MEJ Newman","year":"2004","unstructured":"Newman MEJ: Analysis of weighted networks. Phys Rev 2004, 70: 056131.","journal-title":"Phys Rev"},{"issue":"23","key":"5729_CR16","doi-asserted-by":"publisher","first-page":"8577","DOI":"10.1073\/pnas.0601602103","volume":"103","author":"MEJ Newman","year":"2006","unstructured":"Newman MEJ: Modularity and community structure in networks. PNAS 2006,103(23):8577-8582. 10.1073\/pnas.0601602103","journal-title":"PNAS"},{"issue":"3-5","key":"5729_CR17","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1016\/j.physrep.2009.11.002","volume":"486","author":"S Fortunato","year":"2010","unstructured":"Fortunato S: Community detection in graphs. Phys Rep 2010,486(3-5):75-174.","journal-title":"Phys Rep"},{"key":"5729_CR18","first-page":"1","volume-title":"J Stat Mech","author":"VD Blondel","year":"2008","unstructured":"Blondel VD, Cuillaume JL, Lambiotte R, Lefebvre E: Fast unfolding of communities in large networks. J Stat Mech 2008, 1-12. P10008 P10008"},{"issue":"5","key":"5729_CR19","first-page":"056117","volume":"80","author":"A Lancichinetti","year":"2009","unstructured":"Lancichinetti A, Fortunato S, Lancichinetti A, Fortunato S: Community detection algorithms: a comparative analysis. Phys Rev 2009,80(5):056117.","journal-title":"Phys Rev"},{"issue":"1","key":"5729_CR20","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1073\/pnas.0605965104","volume":"104","author":"S Fortunato","year":"2007","unstructured":"Fortunato S, Barthelemy M: Resolution limit in community detection. PNAS 2007,104(1):36-41. 10.1073\/pnas.0605965104","journal-title":"PNAS"},{"issue":"15","key":"5729_CR21","doi-asserted-by":"publisher","first-page":"3201","DOI":"10.1093\/bioinformatics\/bti517","volume":"21","author":"J Handl","year":"2005","unstructured":"Handl J, Knowles J, Kell DB: Computational cluster validation in post-genomic data analysis. Bioinformatics 2005,21(15):3201-3212. 10.1093\/bioinformatics\/bti517","journal-title":"Bioinformatics"},{"key":"5729_CR22","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieval","author":"CD Manning","year":"2008","unstructured":"Manning CD, Raghavan P, Sch\u00fctze H: Introduction to Information Retrieval. Cambridge University Press; Online edition; 2008."},{"key":"5729_CR23","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1007\/s10791-008-9066-8","volume":"12","author":"E Amigo","year":"2009","unstructured":"Amigo E, Gonzalo J, Artiles J, Verdejo F: A comparison of extrinsic clustering evaluation metrics based on formal constrains. Inf Retrieval 2009, 12: 461-486. 10.1007\/s10791-008-9066-8","journal-title":"Inf Retrieval"},{"key":"5729_CR24","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","volume":"20","author":"PJ Rosseeuw","year":"1987","unstructured":"Rosseeuw PJ: Sihouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987, 20: 53-65.","journal-title":"J Comput Appl Math"},{"issue":"3","key":"5729_CR25","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1080\/01969727308546046","volume":"3","author":"JC Dunn","year":"1973","unstructured":"Dunn JC: A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. J Cybernetics 1973,3(3):32-57. 10.1080\/01969727308546046","journal-title":"J Cybernetics"},{"key":"5729_CR26","doi-asserted-by":"publisher","first-page":"1015","DOI":"10.3390\/d2071015","volume":"2","author":"A Giongo","year":"2010","unstructured":"Giongo A, Richardson AGD, Crabb DB, Triplett EW: Tax Collector: modifying current 16S rRNA databases for the rapid classification at six taxonomic levels. Diversity 2010, 2: 1015-1025. 10.3390\/d2071015","journal-title":"Diversity"},{"key":"5729_CR27","doi-asserted-by":"publisher","first-page":"480","DOI":"10.1038\/nature07540","volume":"457","author":"PJ Turnbaugh","year":"2009","unstructured":"Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE: A core gut microbiome in obese and lean twins. Nature 2009, 457: 480-484. 10.1038\/nature07540","journal-title":"Nature"},{"key":"5729_CR28","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1038\/ismej.2007.53","volume":"1","author":"FW Luiz","year":"2007","unstructured":"Luiz FW: Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J 2007, 1: 283-290.","journal-title":"ISME J"},{"issue":"1","key":"5729_CR29","first-page":"016104","volume":"77","author":"J Ruan","year":"2008","unstructured":"Ruan J, Zhang W: Identifying network communities with a high resolution. Phys Rev 2008,77(1):016104.","journal-title":"Phys Rev"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-14-43.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:21:57Z","timestamp":1630534917000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-14-43"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,2,7]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["5729"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-14-43","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,2,7]]},"assertion":[{"value":"19 June 2012","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 January 2013","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 February 2013","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"43"}}