{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,8,20]],"date-time":"2023-08-20T21:35:45Z","timestamp":1692567345012},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"18","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: As next generation sequencing is rapidly adding new genomes, their correct placement in the taxonomy needs verification. However, the current methods for confirming classification of a taxon or suggesting revision for a potential misplacement relies on computationally intense multi-sequence alignment followed by an iterative adjustment of the distance matrix. Due to intra-heterogeneity issues with the 16S rRNA marker, no classifier is available for sub-genus level, which could readily suggest a classification for a novel 16S rRNA sequence. Metagenomics further complicates the issue by generating fragmented 16S rRNA sequences. This article proposes a novel alignment-free method for representing the microbial profiles using extensible Markov models (EMMs) with an extended Karlin\u2013Altschul statistical framework similar to the classic alignment paradigm. We propose a log odds (LODs) score classifier based on Gumbel difference distribution that confirms correct classifications with statistical significance qualifications and suggests revisions where necessary.<\/jats:p>\n               <jats:p>Results: We tested our method by generating a sub-genus level classifier with which we re-evaluated classifications of 676 microbial organisms using the NCBI FTP database for the 16S rRNA. The results confirm current classification for all genera while ascertaining significance at 95%. Furthermore, this novel classifier isolates heterogeneity issues to a mere 12 strains while confirming classifications with significance qualification for the remaining 98%. The models require less memory than that needed by multi-sequence alignments and have better time complexity than the current methods. The classifier operates at sub-genus level, and thus outperforms the naive Bayes classifier of the RNA Database Project where much of the taxonomic analysis is available online. Finally, using information redundancy in model building, we show that the method applies to metagenomic fragment classification of 19 Escherichia coli strains.<\/jats:p>\n               <jats:p>Availability and implementation: Source code and binaries freely available for download at http:\/\/lyle.smu.edu\/IDA\/EMMSA\/, implemented in JAVA and supported on MS Windows.<\/jats:p>\n               <jats:p>Contact: \u00a0mallik@kotamarti.com; mhd@lyle.smu.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq349","type":"journal-article","created":{"date-parts":[[2010,7,13]],"date-time":"2010-07-13T00:41:08Z","timestamp":1278981668000},"page":"2235-2241","source":"Crossref","is-referenced-by-count":11,"title":["Analyzing taxonomic classification using extensible Markov models"],"prefix":"10.1093","volume":"26","author":[{"given":"Rao M.","family":"Kotamarti","sequence":"first","affiliation":[{"name":"1 Department of Computer Science and Engineering, 2Department of Computer Science, University of Montana, MT 59812 and 3Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA"}]},{"given":"Michael","family":"Hahsler","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, 2Department of Computer Science, University of Montana, MT 59812 and 3Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA"}]},{"given":"Douglas","family":"Raiford","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, 2Department of Computer Science, University of Montana, MT 59812 and 3Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA"}]},{"given":"Monnie","family":"McGee","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, 2Department of Computer Science, University of Montana, MT 59812 and 3Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA"}]},{"given":"Margaret H.","family":"Dunham","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science and Engineering, 2Department of Computer Science, University of Montana, MT 59812 and 3Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,7,12]]},"reference":[{"key":"2023012508244672200_B1","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1128\/AEM.01177-06","article-title":"Use of 16S rRNA and rpoB genes as molecular markers for microbial ecology studies","volume":"73","author":"Case","year":"2007","journal-title":"Appl. Environ. Microbiol"},{"key":"2023012508244672200_B2","doi-asserted-by":"crossref","first-page":"D141","DOI":"10.1093\/nar\/gkn879","article-title":"The Ribosomal Database Project: improved alignments and new tools for rRNA analysis","volume":"37","author":"Cole","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023012508244672200_B3","doi-asserted-by":"crossref","first-page":"3376","DOI":"10.1128\/AEM.66.8.3376-3380.2000","article-title":"rpoB-based microbial community analysis avoids limitations inherent in 16S rRNA gene intraspecies heterogeneity","volume":"66","author":"Dahll\u00f6f","year":"2000","journal-title":"Appl. Environ. Microbiol"},{"key":"2023012508244672200_B4","first-page":"371","article-title":"Extensible Markov model","volume-title":"Proceedings of the Fourth IEEE International Conference on Data Mining ICDM '04","author":"Dunham","year":"2004"},{"key":"2023012508244672200_B5","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023012508244672200_B6","doi-asserted-by":"crossref","first-page":"2079","DOI":"10.1093\/nar\/22.11.2079","article-title":"RNA sequence analysis using covariance models","volume":"22","author":"Eddy","year":"1994","journal-title":"Nucleic Acids Res"},{"key":"2023012508244672200_B7","volume-title":"Bergey's Manual of Systematic Bacteriology, Second Edition","author":"Garrity","year":"2005","edition":"2nd"},{"key":"2023012508244672200_B8","doi-asserted-by":"crossref","first-page":"2309","DOI":"10.1093\/bioinformatics\/bti346","article-title":"Self-organizing and self-correcting classifications of biological data","volume":"21","author":"Garrity","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012508244672200_B9","volume-title":"BLAST","author":"Ian","year":"2003"},{"key":"2023012508244672200_B10","doi-asserted-by":"crossref","first-page":"2761","DOI":"10.1128\/JCM.01228-07","article-title":"16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls","volume":"45","author":"Janda","year":"2007","journal-title":"J. Clin. Microbiol"},{"key":"2023012508244672200_B11","doi-asserted-by":"crossref","first-page":"2264","DOI":"10.1073\/pnas.87.6.2264","article-title":"Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes","volume":"87","author":"Karlin","year":"1990","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508244672200_B12","article-title":"Alignment-free sequence analysis using Extensible Markov Models","author":"Kotamarti","year":"2010","journal-title":"9th International Workshop an Data Mining in Bioinformatics (BIOKDD'10)"},{"key":"2023012508244672200_B13","article-title":"Targeted genomic signature profiling with Quasi-alignment statistics","author":"Kotamarti","year":"2009","journal-title":"COBRA Preprint Series"},{"key":"2023012508244672200_B14","first-page":"1","article-title":"Sequence transformation to a complex signature form for consistent phylogenetic tree using extensible Markov model","volume-title":"Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on","author":"Kotamarti","year":"2010"},{"key":"2023012508244672200_B15","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1099\/ijs.0.02749-0","article-title":"Exploring prokaryotic taxonomy","volume":"54","author":"Lilburn","year":"2004","journal-title":"Int. J. Syst. Evol. Microbiol"},{"key":"2023012508244672200_B16","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1093\/bib\/bbl005","article-title":"Computational aspects of systematic biology","volume":"7","author":"Lilburn","year":"2006","journal-title":"Brief Bioinform"},{"key":"2023012508244672200_B17","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1109\/GRC.2006.1635779","article-title":"Online mining of risk level of traffic anomalies with user s feedbacks","volume-title":"Granular Computing, 2006 IEEE International Conference","author":"Meng","year":"2006"},{"key":"2023012508244672200_B18","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1109\/GRC.2006.1635881","article-title":"Rare event detection in a spatiotemporal environment","volume-title":"Granular Computing, 2006 IEEE International Conference","author":"Meng","year":"2006"},{"key":"2023012508244672200_B19","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1093\/bioinformatics\/btg295","article-title":"A new sequence distance measure for phylogenetic tree construction","volume":"19","author":"Otu","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508244672200_B20","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison-a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508244672200_B21","doi-asserted-by":"crossref","first-page":"5261","DOI":"10.1128\/AEM.00062-07","article-title":"Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy","volume":"73","author":"Wang","year":"2007","journal-title":"Appl. Environ. Microbiol"},{"key":"2023012508244672200_B22","volume-title":"The Practice of Statistics","author":"Yates","year":"2007","edition":"3rd"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/18\/2235\/48858574\/bioinformatics_26_18_2235.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/18\/2235\/48858574\/bioinformatics_26_18_2235.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:25:06Z","timestamp":1674635106000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/18\/2235\/204832"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,7,12]]},"references-count":22,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2010,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq349","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,9,15]]},"published":{"date-parts":[[2010,7,12]]}}}