{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T14:40:15Z","timestamp":1771512015611,"version":"3.50.1"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2020,6,9]],"date-time":"2020-06-09T00:00:00Z","timestamp":1591660800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,7,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Alignment-free, stochastic models derived from k-mer distributions representing reference genome sequences have a rich history in the classification of DNA sequences. In particular, the variants of Markov models have previously been used extensively. Higher-order Markov models have been used with caution, perhaps sparingly, primarily because of the lack of enough training data and computational power. Advances in sequencing technology and computation have enabled exploitation of the predictive power of higher-order models. We, therefore, revisited higher-order Markov models and assessed their performance in classifying metagenomic sequences.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Comparative assessment of higher-order models (HOMs, 9th order or higher) with interpolated Markov model, interpolated context model and lower-order models (8th order or lower) was performed on metagenomic datasets constructed using sequenced prokaryotic genomes. Our results show that HOMs outperform other models in classifying metagenomic fragments as short as 100\u2009nt at all taxonomic ranks, and at lower ranks when the fragment size was increased to 250\u2009nt. HOMs were also found to be significantly more accurate than local alignment which is widely relied upon for taxonomic classification of metagenomic sequences. A novel software implementation written in C++ performs classification faster than the existing Markovian metagenomic classifiers and can therefore be used as a standalone classifier or in conjunction with existing taxonomic classifiers for more robust classification of metagenomic sequences.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The software has been made available at https:\/\/github.com\/djburks\/SMM.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Contact<\/jats:title>\n                  <jats:p>Rajeev.Azad@unt.edu<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa562","type":"journal-article","created":{"date-parts":[[2020,6,3]],"date-time":"2020-06-03T11:14:19Z","timestamp":1591182859000},"page":"4130-4136","source":"Crossref","is-referenced-by-count":11,"title":["Higher-order Markov models for metagenomic sequence classification"],"prefix":"10.1093","volume":"36","author":[{"given":"David J","family":"Burks","sequence":"first","affiliation":[{"name":"Department of Biological Sciences and BioDiscovery Institute"}]},{"given":"Rajeev K","family":"Azad","sequence":"additional","affiliation":[{"name":"Department of Biological Sciences and BioDiscovery Institute"},{"name":"Department of Mathematics, University of North Texas , Denton, TX 76203, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,6,9]]},"reference":[{"key":"2023062213544945400_btaa562-B1","first-page":"1649","article-title":"k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets","volume":"45","author":"Ainsworth","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023062213544945400_btaa562-B3","doi-asserted-by":"crossref","first-page":"e94","DOI":"10.1093\/nar\/gks251","article-title":"Grinder: a versatile amplicon and shotgun sequence simulator","volume":"40","author":"Angly","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B4","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1093\/bioinformatics\/bth028","article-title":"Effects of choice of DNA sequence model structure on gene identification accuracy","volume":"20","author":"Azad","year":"2004","journal-title":"Bioinformatics"},{"key":"2023062213544945400_btaa562-B5","doi-asserted-by":"crossref","first-page":"2607","DOI":"10.1093\/nar\/29.12.2607","article-title":"GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions","volume":"29","author":"Besemer","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B6","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1098\/rspb.2009.1679","article-title":"Horizontal gene transfer in evolution: facts and challenges","volume":"277","author":"Boto","year":"2010","journal-title":"Proc. R. Soc. B Biol. Sci"},{"key":"2023062213544945400_btaa562-B7","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1038\/nmeth.1358","article-title":"Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models","volume":"6","author":"Brady","year":"2009","journal-title":"Nat. Methods"},{"key":"2023062213544945400_btaa562-B8","doi-asserted-by":"crossref","first-page":"7762","DOI":"10.1093\/nar\/gkv784","article-title":"High speed BLASTN: an accelerated MegaBLAST search tool","volume":"43","author":"Chen","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B9","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1101\/gr.225276.117","article-title":"taxMaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time","volume":"28","author":"Corvelo","year":"2018","journal-title":"Genome Res"},{"key":"2023062213544945400_btaa562-B10","doi-asserted-by":"crossref","first-page":"4636","DOI":"10.1093\/nar\/27.23.4636","article-title":"Improved microbial gene identification with GLIMMER","volume":"27","author":"Delcher","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B11","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1016\/j.cell.2015.08.059","article-title":"Regulators of gut motility revealed by a gnotobiotic model of diet-microbiome interactions related to travel","volume":"163","author":"Dey","year":"2015","journal-title":"Cell"},{"key":"2023062213544945400_btaa562-B12","first-page":"161","author":"Essen","year":"1992"},{"key":"2023062213544945400_btaa562-B13","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"HMMER web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B14","doi-asserted-by":"crossref","first-page":"D279","DOI":"10.1093\/nar\/gkv1344","article-title":"The Pfam protein families database: towards a more sustainable future","volume":"44","author":"Finn","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B15","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1038\/s41579-018-0097-x","article-title":"The majority is uncultured","volume":"16","author":"Hofer","year":"2018","journal-title":"Nat. Rev. Microbiol"},{"key":"2023062213544945400_btaa562-B16","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1111\/j.1574-6976.2008.00136.x","article-title":"Genomic islands: tools of bacterial horizontal gene transfer and evolution","volume":"33","author":"Juhas","year":"2009","journal-title":"FEMS Microbiol. Rev"},{"key":"2023062213544945400_btaa562-B17","doi-asserted-by":"crossref","first-page":"e9","DOI":"10.1093\/nar\/gkr1067","article-title":"Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering","volume":"40","author":"Kelley","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B18","doi-asserted-by":"crossref","first-page":"1805","DOI":"10.12688\/f1000research.8737.1","article-title":"Horizontal gene transfer: essentiality and evolvability in prokaryotes, and roles in evolutionary transitions","volume":"5","author":"Koonin","year":"2016","journal-title":"F1000Research"},{"key":"2023062213544945400_btaa562-B19","doi-asserted-by":"crossref","first-page":"R23","DOI":"10.1186\/gb-2009-10-2-r23","article-title":"PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data","volume":"10","author":"Korbel","year":"2009","journal-title":"Genome Biol"},{"key":"2023062213544945400_btaa562-B20","first-page":"348","author":"Kuhn","year":"1988"},{"key":"2023062213544945400_btaa562-B21","doi-asserted-by":"crossref","first-page":"D19","DOI":"10.1093\/nar\/gkq1019","article-title":"The sequence read archive","volume":"39","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B22","doi-asserted-by":"crossref","first-page":"e00055","DOI":"10.1128\/mSystems.00055-18","article-title":"Phylogenetically novel uncultured microbial cells dominate earth microbiomes","volume":"3","author":"Lloyd","year":"2018","journal-title":"mSystems"},{"key":"2023062213544945400_btaa562-B23","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1093\/nar\/26.4.1107","article-title":"GeneMark.hmm: new solutions for gene finding","volume":"26","author":"Lukashin","year":"1998","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/ncomms11257","article-title":"Fast and sensitive taxonomic classification for metagenomics with Kaiju","volume":"7","author":"Menzel","year":"2016","journal-title":"Nat. Commun"},{"key":"2023062213544945400_btaa562-B25","doi-asserted-by":"crossref","first-page":"D726","DOI":"10.1093\/nar\/gkx967","article-title":"EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies","volume":"46","author":"Mitchell","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B26","doi-asserted-by":"crossref","first-page":"e3788","DOI":"10.7717\/peerj.3788","article-title":"MetaCRAST: reference-guided extraction of CRISPR spacers from unassembled metagenomes","volume":"5","author":"Moller","year":"2017","journal-title":"PeerJ"},{"key":"2023062213544945400_btaa562-B27","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/1471-2105-12-41","article-title":"RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles","volume":"12","author":"Nalbantoglu","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023062213544945400_btaa562-B28","author":"Ney","year":"1991"},{"key":"2023062213544945400_btaa562-B29","doi-asserted-by":"crossref","first-page":"D733","DOI":"10.1093\/nar\/gkv1189","article-title":"Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation","volume":"44","author":"O\u2019Leary","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B30","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1186\/s12864-015-1419-2","article-title":"CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers","volume":"16","author":"Ounit","year":"2015","journal-title":"BMC Genomics"},{"key":"2023062213544945400_btaa562-B31","doi-asserted-by":"crossref","first-page":"3455","DOI":"10.1093\/bioinformatics\/bth426","article-title":"A probabilistic measure for alignment-free sequence comparison","volume":"20","author":"Pham","year":"2004","journal-title":"Bioinformatics"},{"key":"2023062213544945400_btaa562-B32","doi-asserted-by":"crossref","first-page":"W116","DOI":"10.1093\/nar\/gki442","article-title":"InterProScan: protein domains identifier","volume":"33","author":"Quevillon","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B33","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/MASSP.1986.1165342","article-title":"An introduction to hidden Markov models","volume":"3","author":"Rabiner","year":"1986","journal-title":"IEEE ASSP Mag"},{"key":"2023062213544945400_btaa562-B34","doi-asserted-by":"crossref","first-page":"544","DOI":"10.1093\/nar\/26.2.544","article-title":"Microbial gene identification using interpolated Markov models","volume":"26","author":"Salzberg","year":"1998","journal-title":"Nucleic Acids Res"},{"key":"2023062213544945400_btaa562-B35","first-page":"81","author":"Saul","year":"1997"},{"key":"2023062213544945400_btaa562-B36","doi-asserted-by":"crossref","first-page":"e105067","DOI":"10.1371\/journal.pone.0105067","article-title":"Profile hidden Markov models for the detection of viruses within metagenomic sequence data","volume":"9","author":"Skewes-Cox","year":"2014","journal-title":"PLoS One"},{"key":"2023062213544945400_btaa562-B37","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison \u2013 a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2023062213544945400_btaa562-B38","doi-asserted-by":"crossref","first-page":"2487","DOI":"10.1093\/bioinformatics\/btt403","article-title":"nhmmer: DNA homology search with profile HMMs","volume":"29","author":"Wheeler","year":"2013","journal-title":"Bioinformatics"},{"key":"2023062213544945400_btaa562-B39","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2023062213544945400_btaa562-B40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-017-1319-7","article-title":"Alignment-free sequence comparison: benefits, applications, and tools","volume":"18","author":"Zielezinski","year":"2017","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa562\/33479764\/btaa562.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/14\/4130\/50677168\/btaa562.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/14\/4130\/50677168\/btaa562.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,23]],"date-time":"2023-06-23T16:21:16Z","timestamp":1687537276000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/14\/4130\/5855128"}},"subtitle":[],"editor":[{"given":"Jinbo","family":"Xu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,6,9]]},"references-count":40,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2020,7,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa562","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,7,15]]},"published":{"date-parts":[[2020,6,9]]}}}