{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T09:24:35Z","timestamp":1776417875495,"version":"3.51.2"},"reference-count":65,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,10,19]],"date-time":"2021-10-19T00:00:00Z","timestamp":1634601600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,19]],"date-time":"2021-10-19T00:00:00Z","timestamp":1634601600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000026","name":"National Institute on Drug Abuse","doi-asserted-by":"publisher","award":["R44DA04395402"],"award-info":[{"award-number":["R44DA04395402"]}],"id":[{"id":"10.13039\/100000026","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["GRFP"],"award-info":[{"award-number":["GRFP"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100005492","name":"Stanford University","doi-asserted-by":"crossref","award":["Stanford Bio-X"],"award-info":[{"award-number":["Stanford Bio-X"]}],"id":[{"id":"10.13039\/100005492","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Precision Health and Integrated Diagnostics Center at Stanford"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Sequencing partial 16S rRNA genes is a cost effective method for quantifying the microbial composition of an environment, such as the human gut. However, downstream analysis relies on binning reads into microbial groups by either considering each unique sequence as a different microbe, querying a database to get taxonomic labels from sequences, or clustering similar sequences together. However, these approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. We present sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences. We compare SBBs against other existing aggregation methods (OTU clustering and <jats:italic>Micropheno<\/jats:italic>or <jats:italic>DiTaxa<\/jats:italic> features) in several benchmarking tasks: biomarker discovery via permutation test, biomarker discovery via linear discriminant analysis, and phenotype prediction power. We demonstrate the SBBs perform on-par or better than the state-of-the-art methods in biomarker discovery and phenotype prediction.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>On two independent datasets, SBBs identify differentially abundant groups of microbes with similar or higher statistical significance than existing methods in both a permutation-test-based analysis and using linear discriminant analysis effect size. . By grouping microbes by SBB, we can identify several differentially abundant microbial groups (FDR &lt;.1) between children with autism and neurotypical controls in a set of 115 discordant siblings. <jats:italic>Porphyromonadaceae<\/jats:italic>, <jats:italic>Ruminococcaceae<\/jats:italic>, and an unnamed species of <jats:italic>Blastocystis<\/jats:italic> were significantly enriched in autism, while <jats:italic>Veillonellaceae<\/jats:italic> was significantly depleted. Likewise, aggregating microbes by SBB on a dataset of obese and lean twins, we find several significantly differentially abundant microbial groups (FDR&lt;.1). We observed <jats:italic>Megasphaera<\/jats:italic> and<jats:italic>Sutterellaceae<\/jats:italic> highly enriched in obesity, and <jats:italic>Phocaeicola<\/jats:italic> significantly depleted. SBBs also perform on bar with or better than existing aggregation methods as features in a phenotype prediction model, predicting the autism phenotype with an ROC-AUC score of .64 and the obesity phenotype with an ROC-AUC score of .84.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>SBBs provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction. Our source code can be freely downloaded from <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/github.com\/briannachrisman\/16s_biomarkers\">http:\/\/github.com\/briannachrisman\/16s_biomarkers<\/jats:ext-link>.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-021-04427-7","type":"journal-article","created":{"date-parts":[[2021,10,19]],"date-time":"2021-10-19T14:01:51Z","timestamp":1634652111000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Improved detection of disease-associated gut microbes using 16S sequence-based biomarkers"],"prefix":"10.1186","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7157-607X","authenticated-orcid":false,"given":"Brianna S.","family":"Chrisman","sequence":"first","affiliation":[]},{"given":"Kelley M.","family":"Paskov","sequence":"additional","affiliation":[]},{"given":"Nate","family":"Stockham","sequence":"additional","affiliation":[]},{"given":"Jae-Yoon","family":"Jung","sequence":"additional","affiliation":[]},{"given":"Maya","family":"Varma","sequence":"additional","affiliation":[]},{"given":"Peter Y.","family":"Washington","sequence":"additional","affiliation":[]},{"given":"Christine","family":"Tataru","sequence":"additional","affiliation":[]},{"given":"Shoko","family":"Iwai","sequence":"additional","affiliation":[]},{"given":"Todd Z.","family":"DeSantis","sequence":"additional","affiliation":[]},{"given":"Maude","family":"David","sequence":"additional","affiliation":[]},{"given":"Dennis P.","family":"Wall","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,10,19]]},"reference":[{"issue":"8","key":"4427_CR1","doi-asserted-by":"publisher","first-page":"1002533","DOI":"10.1371\/journal.pbio.1002533","volume":"14","author":"R Sender","year":"2016","unstructured":"Sender R, Fuchs S, Milo R. Revised estimates for the number of human and bacteria cells in the body. PLoS Biol. 2016;14(8):1002533.","journal-title":"PLoS Biol"},{"issue":"1","key":"4427_CR2","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1016\/j.cell.2014.03.011","volume":"157","author":"Y Belkaid","year":"2014","unstructured":"Belkaid Y, Hand TW. Role of the microbiota in immunity and inflammation. Cell. 2014;157(1):121\u201341.","journal-title":"Cell"},{"issue":"7351","key":"4427_CR3","doi-asserted-by":"publisher","first-page":"327","DOI":"10.1038\/nature10213","volume":"474","author":"AL Kau","year":"2011","unstructured":"Kau AL, Ahern PP, Griffin NW, Goodman AL, Gordon JI. Human nutrition, the gut microbiome and the immune system. Nature. 2011;474(7351):327\u201336.","journal-title":"Nature"},{"issue":"4","key":"4427_CR4","doi-asserted-by":"publisher","first-page":"1877","DOI":"10.1152\/physrev.00018.2018","volume":"99","author":"JF Cryan","year":"2019","unstructured":"Cryan JF, O\u2019Riordan KJ, Cowan CS, Sandhu KV, Bastiaanssen TF, Boehme M, Codagnone MG, Cussotto S, Fulling C, Golubeva AV, et al. The microbiota-gut-brain axis. Physiol Rev. 2019;99(4):1877\u20132013.","journal-title":"Physiol Rev"},{"issue":"5","key":"4427_CR5","doi-asserted-by":"publisher","first-page":"305","DOI":"10.1016\/j.tins.2013.01.005","volume":"36","author":"JA Foster","year":"2013","unstructured":"Foster JA, Neufeld K-AM. Gut-brain axis: how the microbiome influences anxiety and depression. Trends Neurosci. 2013;36(5):305\u201312.","journal-title":"Trends Neurosci"},{"issue":"16","key":"4427_CR6","doi-asserted-by":"publisher","first-page":"5227","DOI":"10.1128\/AEM.00592-09","volume":"75","author":"N Youssef","year":"2009","unstructured":"Youssef N, Sheik CS, Krumholz LR, Najar FZ, Roe BA, Elshahed MS. Comparison of species richness estimates obtained using nearly complete fragments and simulated pyrosequencing-generated fragments in 16s rrna gene-based environmental surveys. Appl Environ Microbiol. 2009;75(16):5227\u201336.","journal-title":"Appl Environ Microbiol"},{"issue":"3","key":"4427_CR7","doi-asserted-by":"publisher","first-page":"32491","DOI":"10.1371\/journal.pone.0032491","volume":"7","author":"Y Lan","year":"2012","unstructured":"Lan Y, Wang Q, Cole JR, Rosen GL. Using the rdp classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS ONE. 2012;7(3):32491.","journal-title":"PLoS ONE"},{"issue":"1","key":"4427_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40168-020-00900-2","volume":"8","author":"J Lu","year":"2020","unstructured":"Lu J, Salzberg SL. Ultrafast and accurate 16s rrna microbial community analysis using kraken 2. Microbiome. 2020;8(1):1\u201311.","journal-title":"Microbiome"},{"issue":"6","key":"4427_CR9","doi-asserted-by":"publisher","first-page":"1403","DOI":"10.1111\/1755-0998.12399","volume":"15","author":"J Bengtsson-Palme","year":"2015","unstructured":"Bengtsson-Palme J, Hartmann M, Eriksson KM, Pal C, Thorell K, Larsson DGJ, Nilsson RH. Metaxa2: improved identification and taxonomic classification of small and large subunit rrna in metagenomic data. Mol Ecol Resour. 2015;15(6):1403\u201314.","journal-title":"Mol Ecol Resour"},{"issue":"1","key":"4427_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-015-0747-1","volume":"16","author":"G Allard","year":"2015","unstructured":"Allard G, Ryan FJ, Jeffery IB, Claesson MJ. Spingo: a rapid species-classifier for microbial amplicon sequences. BMC Bioinform. 2015;16(1):1\u20138.","journal-title":"BMC Bioinform"},{"issue":"1","key":"4427_CR11","doi-asserted-by":"publisher","first-page":"e00163-18","DOI":"10.1128\/mSystems.00163-18","volume":"4","author":"V Caruso","year":"2019","unstructured":"Caruso V, Song X, Asquith M, Karstens L. Performance of microbiome sequence inference methods in environments with varying biomass. MSystems. 2019;4(1):e00163-18.","journal-title":"MSystems"},{"issue":"12","key":"4427_CR12","doi-asserted-by":"publisher","first-page":"2639","DOI":"10.1038\/ismej.2017.119","volume":"11","author":"BJ Callahan","year":"2017","unstructured":"Callahan BJ, McMurdie PJ, Holmes SP. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 2017;11(12):2639\u201343.","journal-title":"ISME J"},{"key":"4427_CR13","doi-asserted-by":"publisher","unstructured":"Stevens BR, Roesch L, Thiago P, Russell JT, Pepine CJ, Holbert RC, Raizada MK, Triplett EW. Depression phenotype identified by using single nucleotide exact amplicon sequence variants of the human gut microbiome. Mol Psychiatry. 2020;1\u201311. https:\/\/doi.org\/10.1038\/s41380-020-0652-5.","DOI":"10.1038\/s41380-020-0652-5"},{"issue":"4","key":"4427_CR14","doi-asserted-by":"publisher","first-page":"1006102","DOI":"10.1371\/journal.pcbi.1006102","volume":"14","author":"SM Gibbons","year":"2018","unstructured":"Gibbons SM, Duvallet C, Alm EJ. Correcting for batch effects in case-control microbiome studies. PLoS Comput Biol. 2018;14(4):1006102.","journal-title":"PLoS Comput Biol"},{"issue":"1","key":"4427_CR15","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1186\/s12864-018-5160-5","volume":"19","author":"MS Kumar","year":"2018","unstructured":"Kumar MS, Slud EV, Okrah K, Hicks SC, Hannenhalli S, Bravo HC. Analysis and correction of compositional bias in sparse sequencing count data. BMC Genom. 2018;19(1):799.","journal-title":"BMC Genom"},{"issue":"9","key":"4427_CR16","first-page":"1","volume":"20","author":"I Patuzzi","year":"2019","unstructured":"Patuzzi I, Baruzzo G, Losasso C, Ricci A, Di Camillo B. metasparsim: a 16s rrna gene sequencing count data simulator. BMC Bioinform. 2019;20(9):1\u201313.","journal-title":"BMC Bioinform"},{"key":"4427_CR17","doi-asserted-by":"publisher","first-page":"5364","DOI":"10.7717\/peerj.5364","volume":"6","author":"JT Nearing","year":"2018","unstructured":"Nearing JT, Douglas GM, Comeau AM, Langille MG. Denoising the denoisers: an independent evaluation of microbiome sequence error-correction approaches. PeerJ. 2018;6:5364.","journal-title":"PeerJ"},{"issue":"12","key":"4427_CR18","doi-asserted-by":"publisher","first-page":"3886","DOI":"10.1128\/AEM.02953-09","volume":"76","author":"AY Pei","year":"2010","unstructured":"Pei AY, Oberdorf WE, Nossa CW, Agarwal A, Chokshi P, Gerz EA, Jin Z, Lee P, Yang L, Poles M, et al. Diversity of 16s rrna genes within individual prokaryotic genomes. Appl Environ Microbiol. 2010;76(12):3886\u201397.","journal-title":"Appl Environ Microbiol"},{"issue":"D1","key":"4427_CR19","doi-asserted-by":"publisher","first-page":"633","DOI":"10.1093\/nar\/gkt1244","volume":"42","author":"JR Cole","year":"2014","unstructured":"Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, Tiedje JM. Ribosomal database project: data and tools for high throughput rrna analysis. Nucleic Acids Res. 2014;42(D1):633\u201342.","journal-title":"Nucleic Acids Res"},{"issue":"7","key":"4427_CR20","doi-asserted-by":"publisher","first-page":"5069","DOI":"10.1128\/AEM.03006-05","volume":"72","author":"TZ DeSantis","year":"2006","unstructured":"DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL. Greengenes, a chimera-checked 16s rrna gene database and workbench compatible with arb. Appl Environ Microbiol. 2006;72(7):5069\u201372.","journal-title":"Appl Environ Microbiol"},{"issue":"D1","key":"4427_CR21","doi-asserted-by":"publisher","first-page":"590","DOI":"10.1093\/nar\/gks1219","volume":"41","author":"C Quast","year":"2012","unstructured":"Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Gl\u00f6ckner FO. The silva ribosomal rna gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2012;41(D1):590\u20136.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"4427_CR22","doi-asserted-by":"publisher","first-page":"733","DOI":"10.1093\/nar\/gkv1189","volume":"44","author":"NA O\u2019Leary","year":"2016","unstructured":"O\u2019Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, et al. Reference sequence (refseq) database at ncbi: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016;44(D1):733\u201345.","journal-title":"Nucleic Acids Res"},{"issue":"7753","key":"4427_CR23","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1038\/s41586-019-0965-1","volume":"568","author":"A Almeida","year":"2019","unstructured":"Almeida A, Mitchell AL, Boland M, Forster SC, Gloor GB, Tarkowska A, Lawley TD, Finn RD. A new genomic blueprint of the human gut microbiota. Nature. 2019;568(7753):499\u2013504.","journal-title":"Nature"},{"issue":"1","key":"4427_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/ncomms3304","volume":"4","author":"N Segata","year":"2013","unstructured":"Segata N, B\u00f6rnigen D, Morgan XC, Huttenhower C. Phylophlan is a new method for improved phylogenetic and taxonomic placement of microbes. Nat Commun. 2013;4(1):1\u201311.","journal-title":"Nat Commun"},{"issue":"1","key":"4427_CR25","doi-asserted-by":"publisher","first-page":"1056","DOI":"10.1186\/s12864-015-2265-y","volume":"16","author":"J Ritari","year":"2015","unstructured":"Ritari J, Saloj\u00e4rvi J, Lahti L, de Vos WM. Improved taxonomic assignment of human intestinal 16s rrna sequences by a dedicated reference database. BMC Genom. 2015;16(1):1056.","journal-title":"BMC Genom"},{"issue":"11","key":"4427_CR26","doi-asserted-by":"publisher","first-page":"2399","DOI":"10.1038\/ismej.2017.113","volume":"11","author":"KT Konstantinidis","year":"2017","unstructured":"Konstantinidis KT, Rossell\u00f3-M\u00f3ra R, Amann R. Uncultivated microbes in need of their own taxonomy. ISME J. 2017;11(11):2399\u2013406.","journal-title":"ISME J"},{"issue":"3","key":"4427_CR27","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1007\/s00203-014-1071-2","volume":"197","author":"CC Thompson","year":"2015","unstructured":"Thompson CC, Amaral GR, Campe\u00e3o M, Edwards RA, Polz MF, Dutilh BE, Ussery DW, Sawabe T, Swings J, Thompson FL. Microbial taxonomy in the post-genomic era: rebuilding from scratch? Arch Microbiol. 2015;197(3):359\u201370.","journal-title":"Arch Microbiol"},{"issue":"1","key":"4427_CR28","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1007\/s10482-014-0148-x","volume":"106","author":"P Vandamme","year":"2014","unstructured":"Vandamme P, Peeters C. Time to revisit polyphasic taxonomy. Antonie Van Leeuwenhoek. 2014;106(1):57\u201365.","journal-title":"Antonie Van Leeuwenhoek"},{"issue":"1","key":"4427_CR29","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1099\/ijsem.0.002516","volume":"68","author":"J Chun","year":"2018","unstructured":"Chun J, Oren A, Ventosa A, Christensen H, Arahal DR, da Costa MS, Rooney AP, Yi H, Xu X-W, De Meyer S, et al. Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int J Syst Evol Microbiol. 2018;68(1):461\u20136.","journal-title":"Int J Syst Evol Microbiol"},{"issue":"1","key":"4427_CR30","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1186\/s12859-018-2349-1","volume":"19","author":"R M\u00fcller","year":"2018","unstructured":"M\u00fcller R, Nebel ME. Gefast: an improved method for otu assignment by generalising swarm\u2019s fastidious clustering approach. BMC Bioinform. 2018;19(1):321.","journal-title":"BMC Bioinform"},{"issue":"1","key":"4427_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-12-271","volume":"12","author":"M Ghodsi","year":"2011","unstructured":"Ghodsi M, Liu B, Pop M. Dnaclust: accurate and efficient clustering of phylogenetic marker genes. BMC Bioinform. 2011;12(1):1\u201311.","journal-title":"BMC Bioinform"},{"issue":"1","key":"4427_CR32","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1186\/1471-2105-14-43","volume":"14","author":"X Wang","year":"2013","unstructured":"Wang X, Yao J, Sun Y, Mai V. M-pick, a modularity-based method for otu picking of 16s rrna sequences. BMC Bioinform. 2013;14(1):43.","journal-title":"BMC Bioinform"},{"issue":"1","key":"4427_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-019-13036-1","volume":"10","author":"JS Johnson","year":"2019","unstructured":"Johnson JS, Spakowicz DJ, Hong B-Y, Petersen LM, Demkowicz P, Chen L, Leopold SR, Hanson BM, Agresta HO, Gerstein M, et al. Evaluation of 16s rrna gene sequencing for species and strain-level microbiome analysis. Nat Commun. 2019;10(1):1\u201311.","journal-title":"Nat Commun"},{"issue":"1","key":"4427_CR34","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1186\/1471-2105-11-152","volume":"11","author":"JR White","year":"2010","unstructured":"White JR, Navlakha S, Nagarajan N, Ghodsi M-R, Kingsford C, Pop M. Alignment and clustering of phylogenetic markers-implications for microbial diversity studies. BMC Bioinform. 2010;11(1):152.","journal-title":"BMC Bioinform"},{"issue":"1","key":"4427_CR35","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s40168-015-0081-x","volume":"3","author":"Y He","year":"2015","unstructured":"He Y, Caporaso JG, Jiang X-T, Sheng H-F, Huse SM, Rideout JR, Edgar RC, Kopylova E, Walters WA, Knight R, et al. Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity. Microbiome. 2015;3(1):20.","journal-title":"Microbiome"},{"issue":"1","key":"4427_CR36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/npjbiofilms.2016.4","volume":"2","author":"N-P Nguyen","year":"2016","unstructured":"Nguyen N-P, Warnow T, Pop M, White B. A perspective on 16s rrna operational taxonomic unit clustering using sequence similarity. NPJ Biofilms Microbiomes. 2016;2(1):1\u20138.","journal-title":"NPJ Biofilms Microbiomes"},{"issue":"8","key":"4427_CR37","doi-asserted-by":"publisher","first-page":"70837","DOI":"10.1371\/journal.pone.0070837","volume":"8","author":"W Chen","year":"2013","unstructured":"Chen W, Zhang CK, Cheng Y, Zhang S, Zhao H. A comparison of methods for clustering 16s rrna sequences into otus. PLoS ONE. 2013;8(8):70837.","journal-title":"PLoS ONE"},{"issue":"14","key":"4427_CR38","doi-asserted-by":"publisher","first-page":"2498","DOI":"10.1093\/bioinformatics\/bty954","volume":"35","author":"E Asgari","year":"2019","unstructured":"Asgari E, M\u00fcnch PC, Lesker TR, McHardy AC, Mofrad MR. Ditaxa: nucleotide-pair encoding of 16s rrna for host phenotype and biomarker detection. Bioinformatics. 2019;35(14):2498\u2013500.","journal-title":"Bioinformatics"},{"issue":"13","key":"4427_CR39","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1093\/bioinformatics\/bty296","volume":"34","author":"E Asgari","year":"2018","unstructured":"Asgari E, Garakani K, McHardy AC, Mofrad MR. Micropheno: predicting environments and host phenotypes from 16s rrna gene sequencing using a k-mer based representation of shallow sub-samples. Bioinformatics. 2018;34(13):32\u201342.","journal-title":"Bioinformatics"},{"issue":"1","key":"4427_CR40","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1038\/ismej.2011.82","volume":"6","author":"JJ Werner","year":"2012","unstructured":"Werner JJ, Koren O, Hugenholtz P, DeSantis TZ, Walters WA, Caporaso JG, Angenent LT, Knight R, Ley RE. Impact of training sets on classification of high-throughput bacterial 16s rrna gene surveys. ISME J. 2012;6(1):94\u2013103.","journal-title":"ISME J"},{"issue":"1","key":"4427_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-016-0992-y","volume":"17","author":"B Yang","year":"2016","unstructured":"Yang B, Wang Y, Qian P-Y. Sensitivity and correlation of hypervariable regions in 16s rrna genes in phylogenetic analysis. BMC Bioinform. 2016;17(1):1\u20138.","journal-title":"BMC Bioinform"},{"issue":"6","key":"4427_CR42","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/gb-2011-12-6-r60","volume":"12","author":"N Segata","year":"2011","unstructured":"Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, Huttenhower C. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12(6):1\u201318.","journal-title":"Genome Biol"},{"issue":"1","key":"4427_CR43","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/2049-2618-1-11","volume":"1","author":"A Statnikov","year":"2013","unstructured":"Statnikov A, Henaff M, Narendra V, Konganti K, Li Z, Yang L, Pei Z, Blaser MJ, Aliferis CF, Alekseyenko AV. A comprehensive evaluation of multicategory classification methods for microbiomic data. Microbiome. 2013;1(1):1\u201312.","journal-title":"Microbiome"},{"issue":"5","key":"4427_CR44","doi-asserted-by":"publisher","first-page":"1054","DOI":"10.1016\/j.cmet.2017.04.001","volume":"25","author":"R Loomba","year":"2017","unstructured":"Loomba R, Seguritan V, Li W, Long T, Klitgord N, Bhatt A, Dulai PS, Caussy C, Bettencourt R, Highlander SK, et al. Gut microbiome-based metagenomic signature for non-invasive detection of advanced fibrosis in human nonalcoholic fatty liver disease. Cell Metab. 2017;25(5):1054\u201362.","journal-title":"Cell Metab"},{"issue":"2","key":"4427_CR45","doi-asserted-by":"publisher","first-page":"104","DOI":"10.3390\/genes9020104","volume":"9","author":"A Belk","year":"2018","unstructured":"Belk A, Xu ZZ, Carter DO, Lynne A, Bucheli S, Knight R, Metcalf JL. Microbiome data accurately predicts the postmortem interval using random forest regression models. Genes. 2018;9(2):104.","journal-title":"Genes"},{"key":"4427_CR46","doi-asserted-by":"publisher","first-page":"190007","DOI":"10.1038\/sdata.2019.7","volume":"6","author":"YS Bukin","year":"2019","unstructured":"Bukin YS, Galachyants YP, Morozov I, Bukin S, Zakharenko A, Zemskaya T. The effect of 16s rrna region choice on bacterial community metabarcoding results. Sci data. 2019;6:190007.","journal-title":"Sci data"},{"issue":"1","key":"4427_CR47","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12887-019-1896-6","volume":"19","author":"H Sun","year":"2019","unstructured":"Sun H, You Z, Jia L, Wang F. Autism spectrum disorder is associated with gut microbiota disorder in children. BMC Pediatr. 2019;19(1):1\u20137.","journal-title":"BMC Pediatr"},{"issue":"7","key":"4427_CR48","doi-asserted-by":"publisher","first-page":"68322","DOI":"10.1371\/journal.pone.0068322","volume":"8","author":"D-W Kang","year":"2013","unstructured":"Kang D-W, Park JG, Ilhan ZE, Wallstrom G, LaBaer J, Adams JB, Krajmalnik-Brown R. Reduced incidence of prevotella and other fermenters in intestinal microflora of autistic children. PLoS ONE. 2013;8(7):68322.","journal-title":"PLoS ONE"},{"key":"4427_CR49","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1016\/j.bbi.2015.03.016","volume":"48","author":"H Jiang","year":"2015","unstructured":"Jiang H, Ling Z, Zhang Y, Mao H, Ma Z, Yin Y, Wang W, Tang W, Tan Z, Shi J, et al. Altered fecal microbiota composition in patients with major depressive disorder. Brain Behav Immun. 2015;48:186\u201394.","journal-title":"Brain Behav Immun"},{"issue":"7228","key":"4427_CR50","doi-asserted-by":"publisher","first-page":"480","DOI":"10.1038\/nature07540","volume":"457","author":"PJ Turnbaugh","year":"2009","unstructured":"Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, et al. A core gut microbiome in obese and lean twins. Nature. 2009;457(7228):480.","journal-title":"Nature"},{"issue":"7","key":"4427_CR51","doi-asserted-by":"publisher","first-page":"0134333","DOI":"10.1371\/journal.pone.0134333","volume":"10","author":"H-J Hu","year":"2015","unstructured":"Hu H-J, Park S-G, Jang HB, Choi M-G, Park K-H, Kang JH, Park SI, Lee H-J, Cho S-H. Obesity alters the microbial community profile in Korean adolescents. PLoS ONE. 2015;10(7):0134333.","journal-title":"PLoS ONE"},{"key":"4427_CR52","doi-asserted-by":"publisher","first-page":"1502","DOI":"10.3389\/fmicb.2017.01502","volume":"8","author":"K Dougal","year":"2017","unstructured":"Dougal K, Harris PA, Girdwood SE, Creevey CJ, Curtis GC, Barfoot CF, Argo CM, Newbold CJ. Changes in the total fecal bacterial population in individual horses maintained on a restricted diet over 6 weeks. Front Microbiol. 2017;8:1502.","journal-title":"Front Microbiol"},{"issue":"7","key":"4427_CR53","doi-asserted-by":"publisher","first-page":"1180","DOI":"10.1136\/gutjnl-2018-316106","volume":"68","author":"RY Tito","year":"2019","unstructured":"Tito RY, Chaffron S, Caenepeel C, Lima-Mendez G, Wang J, Vieira-Silva S, Falony G, Hildebrand F, Darzi Y, Rymenans L, et al. Population-level analysis of blastocystis subtype prevalence and variation in the human gut microbiota. Gut. 2019;68(7):1180\u20139.","journal-title":"Gut"},{"issue":"7","key":"4427_CR54","doi-asserted-by":"publisher","first-page":"1273","DOI":"10.1038\/ismej.2011.186","volume":"6","author":"JJ Werner","year":"2012","unstructured":"Werner JJ, Zhou D, Caporaso JG, Knight R, Angenent LT. Comparison of illumina paired-end and single-direction sequencing for microbial 16s rrna gene amplicon surveys. ISME J. 2012;6(7):1273\u20136.","journal-title":"ISME J"},{"issue":"2","key":"4427_CR55","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1309\/AJCPDOWQSL6E8DMN","volume":"133","author":"CM Vassalos","year":"2010","unstructured":"Vassalos CM, Spanakos G, Vassalou E, Papadopoulou C, Vakalis N. Differences in clinical significance and morphologic features of Blastocystis sp subtype 3. Am J Clin Pathol. 2010;133(2):251\u20138.","journal-title":"Am J Clin Pathol"},{"issue":"4","key":"4427_CR56","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1128\/CMR.00022-08","volume":"21","author":"KS Tan","year":"2008","unstructured":"Tan KS. New insights on classification, identification, and clinical relevance of Blastocystis spp. Clin Microbiol Rev. 2008;21(4):639\u201365.","journal-title":"Clin Microbiol Rev"},{"issue":"3","key":"4427_CR57","doi-asserted-by":"publisher","first-page":"652","DOI":"10.1016\/j.mehy.2007.01.027","volume":"69","author":"KF Boorom","year":"2007","unstructured":"Boorom KF. Is this recently characterized gastrointestinal pathogen responsible for rising rates of inflammatory bowel disease (ibd) and ibd associated autism in europe and the united states in the 1990s? Med Hypotheses. 2007;69(3):652\u20139.","journal-title":"Med Hypotheses"},{"key":"4427_CR58","unstructured":"PubMed. U.S. National Library of Medicine. http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/. Accessed 1 Aug 2020."},{"issue":"7","key":"4427_CR59","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1038\/nmeth.3869","volume":"13","author":"BJ Callahan","year":"2016","unstructured":"Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. Dada2: high-resolution sample inference from illumina amplicon data. Nat Methods. 2016;13(7):581.","journal-title":"Nat Methods"},{"issue":"10","key":"4427_CR60","doi-asserted-by":"publisher","first-page":"796","DOI":"10.1038\/s41592-018-0141-9","volume":"15","author":"A Gonzalez","year":"2018","unstructured":"Gonzalez A, Navas-Molina JA, Kosciolek T, McDonald D, V\u00e1zquez-Baeza Y, Ackermann G, DeReus J, Janssen S, Swafford AD, Orchanian SB, et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat Methods. 2018;15(10):796.","journal-title":"Nat Methods"},{"key":"4427_CR61","unstructured":"A core gut microbiome in obese and lean twins. - ID 77. https:\/\/qiita.ucsd.edu\/public\/?artifact_id=6821. Accessed 12 Oct 2019."},{"key":"4427_CR62","doi-asserted-by":"publisher","first-page":"2224","DOI":"10.3389\/fmicb.2017.02224","volume":"8","author":"GB Gloor","year":"2017","unstructured":"Gloor GB, Macklaim JM, Pawlowsky-Glahn V, Egozcue JJ. Microbiome datasets are compositional: and this is not optional. Front Microbiol. 2017;8:2224.","journal-title":"Front Microbiol"},{"issue":"1","key":"4427_CR63","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-019-10656-5","volume":"10","author":"JT Morton","year":"2019","unstructured":"Morton JT, Marotz C, Washburne A, Silverman J, Zaramela LS, Edlund A, Zengler K, Knight R. Establishing microbial composition measurement standards with reference frames. Nat Commun. 2019;10(1):1\u201311.","journal-title":"Nat Commun"},{"issue":"1","key":"4427_CR64","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1751-0473-3-15","volume":"3","author":"A Camargo","year":"2008","unstructured":"Camargo A, Azuaje F, Wang H, Zheng H. Permutation-based statistical tests for multiple hypotheses. Source Code Biol Med. 2008;3(1):1\u20138.","journal-title":"Source Code Biol Med"},{"issue":"22","key":"4427_CR65","doi-asserted-by":"publisher","first-page":"2933","DOI":"10.1093\/bioinformatics\/btt509","volume":"29","author":"EP Nawrocki","year":"2013","unstructured":"Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster rna homology searches. Bioinformatics. 2013;29(22):2933\u20135.","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04427-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-04427-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04427-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,7]],"date-time":"2023-02-07T19:05:27Z","timestamp":1675796727000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-04427-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,19]]},"references-count":65,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["4427"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-04427-7","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,19]]},"assertion":[{"value":"20 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 October 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The Stanford University Institutional Review Board reviewed and approved the study (IRB 30205) prior to any human subjects research activities taking place. Informed consent was collected electronically online from the parent\/legal guardian of all participants prior to participants completing research activities. The obesity data from a previously published paper, and the data was already made public. [].","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare a conflict of interest. The authors affiliated with Second Genome, Inc. have the following competing interests: this work was supported by Second Genome Inc. which employs and provides stock options to T.D, and SI. Second Genome Inc. is an independent therapeutics company with products in development to treat human disease. A publication announcing the viability of using sequence-based biomarker analysis for academic use will not affect the value of SG\u2019s therapeutic products. MD has financial interests in\/relative to Second Genome, company that could benefit from the conduct or outcomes of this research.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"509"}}