{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:51Z","timestamp":1772138091069,"version":"3.50.1"},"reference-count":14,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2018,11,30]],"date-time":"2018-11-30T00:00:00Z","timestamp":1543536000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001659","name":"German Research Foundation","doi-asserted-by":"publisher","award":["315980449"],"award-info":[{"award-number":["315980449"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Summary<\/jats:title>\n                    <jats:p>Identifying distinctive taxa for micro-biome-related diseases is considered key to the establishment of diagnosis and therapy options in precision medicine and imposes high demands on the accuracy of micro-biome analysis techniques. We propose an alignment- and reference- free subsequence based 16S rRNA data analysis, as a new paradigm for micro-biome phenotype and biomarker detection. Our method, called DiTaxa, substitutes standard operational taxonomic unit (OTU)-clustering by segmenting 16S rRNA reads into the most frequent variable-length subsequences. We compared the performance of DiTaxa to the state-of-the-art methods in phenotype and biomarker detection, using human-associated 16S rRNA samples for periodontal disease, rheumatoid arthritis and inflammatory bowel diseases, as well as a synthetic benchmark dataset. DiTaxa performed competitively to the k-mer based state-of-the-art approach in phenotype prediction while outperforming the OTU-based state-of-the-art approach in finding biomarkers in both resolution and coverage evaluated over known links from literature and synthetic benchmark datasets.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>DiTaxa is available under the Apache 2 license at http:\/\/llp.berkeley.edu\/ditaxa.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty954","type":"journal-article","created":{"date-parts":[[2018,11,28]],"date-time":"2018-11-28T23:23:53Z","timestamp":1543447433000},"page":"2498-2500","source":"Crossref","is-referenced-by-count":10,"title":["DiTaxa: nucleotide-pair encoding of 16S rRNA for host phenotype and biomarker detection"],"prefix":"10.1093","volume":"35","author":[{"given":"Ehsaneddin","family":"Asgari","sequence":"first","affiliation":[{"name":"Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA"},{"name":"Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Brunswick, Germany"}]},{"given":"Philipp C","family":"M\u00fcnch","sequence":"additional","affiliation":[{"name":"Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Brunswick, Germany"},{"name":"Faculty of Medicine, LMU Munich, Max von Pettenkofer-Institute of Hygiene and Medical Microbiology, Munich, Germany"}]},{"given":"Till R","family":"Lesker","sequence":"additional","affiliation":[{"name":"Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Brunswick, Germany"}]},{"given":"Alice C","family":"McHardy","sequence":"additional","affiliation":[{"name":"Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Brunswick, Germany"}]},{"given":"Mohammad R K","family":"Mofrad","sequence":"additional","affiliation":[{"name":"Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA"},{"name":"Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Lab, Berkeley, CA, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,11,30]]},"reference":[{"key":"2023062712305866400_bty954-B1","doi-asserted-by":"crossref","first-page":"i32","DOI":"10.1093\/bioinformatics\/bty296","article-title":"Micropheno: predicting environments and host phenotypes from 16\u2009s rrna gene sequencing using a k-mer based representation of shallow sub-samples","volume":"34","author":"Asgari","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062712305866400_bty954-B2","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1038\/nmeth.3869","article-title":"Dada2: high-resolution sample inference from illumina amplicon data","volume":"13","author":"Callahan","year":"2016","journal-title":"Nat. Methods"},{"key":"2023062712305866400_bty954-B3","first-page":"62","article-title":"Compressed pattern matching in DNA sequences","volume-title":"2004 IEEE Proceedings of 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004","author":"Chen","year":"2004"},{"key":"2023062712305866400_bty954-B4","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1038\/nmeth.2604","article-title":"Uparse: highly accurate otu sequences from microbial amplicon reads","volume":"10","author":"Edgar","year":"2013","journal-title":"Nat. Methods"},{"key":"2023062712305866400_bty954-B5","first-page":"23","article-title":"A new algorithm for data compression","volume":"12","author":"Gage","year":"1994","journal-title":"C Users J."},{"key":"2023062712305866400_bty954-B6","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1016\/j.chom.2014.02.005","article-title":"The treatment-naive microbiome in new-onset Crohn\u2019s disease","volume":"15","author":"Gevers","year":"2014","journal-title":"Cell Host Microbe"},{"key":"2023062712305866400_bty954-B7","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1016\/j.chom.2011.10.006","article-title":"Low-abundance biofilm species orchestrates inflammatory periodontal disease through the commensal microbiota and complement","volume":"10","author":"Hajishengallis","year":"2011","journal-title":"Cell Host Microbe"},{"key":"2023062712305866400_bty954-B8","doi-asserted-by":"crossref","first-page":"e01012","DOI":"10.1128\/mBio.01012-14","article-title":"Metatranscriptomics of the human oral microbiome during health and disease","volume":"5","author":"Jorth","year":"2014","journal-title":"MBio"},{"key":"2023062712305866400_bty954-B9","doi-asserted-by":"crossref","first-page":"1167","DOI":"10.1902\/jop.2000.71.7.1167","article-title":"Induction of experimental periodontitis in mice with Porphyromonas gingivalis-adhered ligatures","volume":"71","author":"Kimura","year":"2000","journal-title":"J. Periodontol."},{"key":"2023062712305866400_bty954-B10","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1111\/j.1600-051X.2009.01393.x","article-title":"Mouse model of experimental periodontitis induced by Porphyromonas gingivalis Fusobacterium nucleatum infection: bone loss and host response","volume":"36","author":"Polak","year":"2009","journal-title":"J. Clin. Periodontol."},{"key":"2023062712305866400_bty954-B11","doi-asserted-by":"crossref","DOI":"10.1128\/AEM.02627-17","article-title":"The madness of microbiome: attempting to find consensus \u2018best practice\u2019 for 16S microbiome studies","volume":"84","author":"Pollock","year":"2018","journal-title":"Appl. Environ. Microbiol."},{"key":"2023062712305866400_bty954-B12","doi-asserted-by":"crossref","first-page":"e01202","DOI":"10.7554\/eLife.01202","article-title":"Expansion of intestinal Prevotella copri correlates with enhanced susceptibility to arthritis","volume":"2013","author":"Scher","year":"2013","journal-title":"eLife"},{"key":"2023062712305866400_bty954-B13","doi-asserted-by":"crossref","first-page":"R60","DOI":"10.1186\/gb-2011-12-6-r60","article-title":"Metagenomic biomarker discovery and explanation","volume":"12","author":"Segata","year":"2011","journal-title":"Genome Biol."},{"key":"2023062712305866400_bty954-B14","first-page":"1715","article-title":"Neural machine translation of rare words with subword units","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics","author":"Sennrich","year":"2016"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/14\/2498\/50720509\/bioinformatics_35_14_2498.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/14\/2498\/50720509\/bioinformatics_35_14_2498.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T08:31:30Z","timestamp":1687854690000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/14\/2498\/5221016"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,11,30]]},"references-count":14,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2019,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty954","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/334722","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,7]]},"published":{"date-parts":[[2018,11,30]]}}}