{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T01:05:02Z","timestamp":1773277502822,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"20","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r.<\/jats:p>\n               <jats:p>Results: The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.<\/jats:p>\n               <jats:p>Availability: The software, data and supplement material are freely available at http:\/\/math.dlut.edu.cn\/daiqi\/mplusd.html.<\/jats:p>\n               <jats:p>Contact: \u00a0daiailiu2004@yahoo.com.cn<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn436","type":"journal-article","created":{"date-parts":[[2008,8,19]],"date-time":"2008-08-19T00:14:03Z","timestamp":1219104843000},"page":"2296-2302","source":"Crossref","is-referenced-by-count":63,"title":["Markov model plus <i>k<\/i>-word distributions: a synergy that produces novel statistical measures for sequence comparison"],"prefix":"10.1093","volume":"24","author":[{"given":"Qi","family":"Dai","sequence":"first","affiliation":[{"name":"Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanchun","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianming","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2008,8,18]]},"reference":[{"key":"2023020211260211800_B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023020211260211800_B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023020211260211800_B3","doi-asserted-by":"crossref","first-page":"5155","DOI":"10.1073\/pnas.83.14.5155","article-title":"A measure of the similarity of sets of sequences not requiring sequence alignment","volume":"83","author":"Blaisdell","year":"1986","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020211260211800_B4","doi-asserted-by":"crossref","first-page":"1145","DOI":"10.1016\/S0031-3203(96)00142-2","article-title":"The use of the area under the ROC curve in the evaluation of machine learning algorithms","volume":"30","author":"Bradley","year":"1997","journal-title":"Pattern Recog."},{"key":"2023020211260211800_B5","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1007\/PL00006389","article-title":"Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders","volume":"47","author":"Cao","year":"1998","journal-title":"J. Mol. Evol."},{"key":"2023020211260211800_B6","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis","author":"Durbin","year":"1998"},{"key":"2023020211260211800_B7","volume-title":"Signal Detection Theory and ROC-Analysis","author":"Egan","year":"1975"},{"key":"2023020211260211800_B8","first-page":"164","article-title":"PHYLIP-Phylogeny inference package (version 3.2)","volume":"5","author":"Felsenstein","year":"1989","journal-title":"Cladistics"},{"key":"2023020211260211800_B9","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1016\/S0076-6879(96)66026-1","article-title":"Inferring phylogenies from protein sequences by parsimony, distance and likelihood methods","volume":"266","author":"Felsenstein","year":"1996","journal-title":"Meth. Enzymol."},{"key":"2023020211260211800_B10","first-page":"287","article-title":"Statistical method for predicting protein coding regions in nucleic acid sequences","volume":"3","author":"Fichant","year":"1987","journal-title":"Comput. Appl. Biosci."},{"key":"2023020211260211800_B11","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1093\/bioinformatics\/bti794","article-title":"REDfly: a regulatory element database for Drosophila","volume":"22","author":"Gallo","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B12","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1109\/JPROC.2002.805303","article-title":"Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison","volume":"90","author":"Green","year":"2002","journal-title":"Proc. IEEE"},{"key":"2023020211260211800_B13","doi-asserted-by":"crossref","first-page":"754","DOI":"10.1093\/bioinformatics\/17.8.754","article-title":"MRBAYES: Bayesian inference of phylogenetic trees","volume":"17","author":"Huelsenbeck","year":"2001","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B14","volume-title":"Markov Chains Theory and It's Applications","author":"Isaacson","year":"1976"},{"key":"2023020211260211800_B15","doi-asserted-by":"crossref","first-page":"i249","DOI":"10.1093\/bioinformatics\/btm211","article-title":"A statistical method for alignment-free comparison of regulatory sequences","volume":"23","author":"Kantorovitz","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B16","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1055\/s-2001-15821","article-title":"Phylogenetic analysis based on 18S rRNA gene and matK gene sequences of Panax vietnamensis and five related species","volume":"67","author":"Komatsu","year":"2001","journal-title":"Planta Med."},{"key":"2023020211260211800_B17","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"2023020211260211800_B18","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1093\/bib\/5.2.150","article-title":"MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment","volume":"5","author":"Kumar","year":"2004","journal-title":"Brief. Bioinform."},{"key":"2023020211260211800_B19","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1093\/bioinformatics\/17.2.149","article-title":"An information-based sequence distance and its application to whole mitochondrial genome phylogeny","volume":"17","author":"Li","year":"2001","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B20","doi-asserted-by":"crossref","first-page":"13980","DOI":"10.1073\/pnas.202468099","article-title":"Distributional regimes for the number of k-word matches between two random sequences","volume":"99","author":"Lippert","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020211260211800_B21","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/S0378-1119(02)00695-9","article-title":"Pika and vole mitochondrial genomes increase support for both rodent monophyly and Glires","volume":"294","author":"Lin","year":"2002","journal-title":"Gene"},{"key":"2023020211260211800_B22","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1016\/j.compbiolchem.2004.03.002","article-title":"Cluster-C: an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques","volume":"28","author":"Mohseni-Zadeh","year":"2004","journal-title":"Comput. Biol. Chem."},{"key":"2023020211260211800_B23","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1093\/bioinformatics\/btg295","article-title":"A new sequence distance measure for phylogenetic tree construction","volume":"19","author":"Otu","year":"2003","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B24","doi-asserted-by":"crossref","first-page":"3455","DOI":"10.1093\/bioinformatics\/bth426","article-title":"A probabilistic measure for alignment-free sequence comparison","volume":"20","author":"Pham","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B25","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1016\/j.patcog.2006.02.026","article-title":"Spectral distortion measures for biological sequence comparisons and database searching","volume":"40","author":"Pham","year":"2007","journal-title":"Pattern Recog."},{"key":"2023020211260211800_B26","doi-asserted-by":"crossref","first-page":"S182","DOI":"10.1093\/bioinformatics\/18.suppl_2.S182","article-title":"ProClust: improved clustering of protein sequences with an extended graph-based approach","volume":"18","author":"Pipenbacher","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B27","doi-asserted-by":"crossref","first-page":"979","DOI":"10.1093\/oxfordjournals.molbev.a026379","article-title":"Where do rodents fit? Evidence from the complete mitochondrial genome of Sciurus vulgaris","volume":"17","author":"Reyes","year":"2000","journal-title":"Mol. Biol. Evol."},{"key":"2023020211260211800_B28","doi-asserted-by":"crossref","first-page":"1572","DOI":"10.1093\/bioinformatics\/btg180","article-title":"MrBayes 3: Bayesian phylogenetic inference under mixed models","volume":"19","author":"Ronquist","year":"2003","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B29","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1093\/bioinformatics\/18.1.100","article-title":"Integrated gene and species phylogenies from unaligned whole genome protein sequences","volume":"18","author":"Stuart","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B30","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1093\/bioinformatics\/btg425","article-title":"Metrics for comparing regulatory sequences on the basis of pattern counts","volume":"20","author":"Van Helden","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B31","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison-a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2023020211260211800_B32","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1006\/mpev.1997.0452","article-title":"General time reversible distances with unequal rates across sites: mixing \u03b3 and inverse Gaussian distributions with invariant sites","volume":"8","author":"Waddell","year":"1997","journal-title":"Mol. Phyl. Evol."},{"key":"2023020211260211800_B33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/sysbio\/48.1.1","article-title":"Towards resolving the interordinal relationships of placental mammals","volume":"48","author":"Waddell","year":"1999","journal-title":"Syst. Biol."},{"key":"2023020211260211800_B34","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1080\/106351599260481","article-title":"Assessing the Cretaceous superordinal divergence times within birds and placental mammals using whole mitochondrial protein sequences and an extended statistical framework","volume":"8","author":"Waddell","year":"1999","journal-title":"Syst. Biol."},{"key":"2023020211260211800_B35","first-page":"141","article-title":"A phylogenetic foundation for comparative mammalian genomics","volume":"12","author":"Waddell","year":"2001","journal-title":"Genome Inform. Ser."},{"key":"2023020211260211800_B36","volume-title":"Introduction to Computational Biology: Maps, Sequences, and Genomes: Interdisciplinary Statistics","author":"Waterman","year":"1995"},{"key":"2023020211260211800_B37","doi-asserted-by":"crossref","first-page":"1431","DOI":"10.2307\/2533509","article-title":"A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words","volume":"53","author":"Wu","year":"1997","journal-title":"Biometrics"},{"key":"2023020211260211800_B38","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1111\/j.0006-341X.2001.00441.x","article-title":"Statistical measures of DNA dissimilarity under Markov chain models of base composition","volume":"57","author":"Wu","year":"2001","journal-title":"Biometrics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/20\/2296\/49052652\/bioinformatics_24_20_2296.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/20\/2296\/49052652\/bioinformatics_24_20_2296.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T14:15:01Z","timestamp":1675347301000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/20\/2296\/259406"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8,18]]},"references-count":38,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2008,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn436","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,10,15]]},"published":{"date-parts":[[2008,8,18]]}}}