{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,7,3]],"date-time":"2023-07-03T11:53:30Z","timestamp":1688385210469},"reference-count":17,"publisher":"Oxford University Press (OUP)","issue":"16","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Position weight matrices (PWMs) have become a standard for representing biological sequence motifs. Their relative simplicity has favoured the development of efficient algorithms for diverse tasks such as motif identification, sequence scanning and statistical significance evaluation. Markov chainbased models generalize the PWM model by allowing for interposition dependencies to be considered, at the cost of substantial computational overhead, which may limit their application.<\/jats:p>\n               <jats:p>Results: In this article, we consider two aspects regarding the use of higher order Markov models for biological sequence motifs, namely, the representation and the computation of P-values for motifs described by a set of occurrences. We propose an efficient representation based on the use of tries, from which empirical position-specific conditional base probabilities can be computed, and extend state-of-the-art PWM-based algorithms to allow for the computation of exact P-values for high-order Markov motif models.<\/jats:p>\n               <jats:p>Availability: The software is available in the form of a Java objectoriented library from http:\/\/www.cin.ufpe.br\/~paguso\/kmarkov.<\/jats:p>\n               <jats:p>Contact: \u00a0paguso@cin.ufpe.br<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn282","type":"journal-article","created":{"date-parts":[[2008,8,9]],"date-time":"2008-08-09T13:08:02Z","timestamp":1218287282000},"page":"i160-i166","source":"Crossref","is-referenced-by-count":4,"title":["Efficient representation and <i>P<\/i>-value computation for high-order Markov motifs"],"prefix":"10.1093","volume":"24","author":[{"given":"Paulo G. S.","family":"da Fonseca","sequence":"first","affiliation":[{"name":"1 Centro de Inform\u00e1tica, Universidade Federal de Pernambuco, 50732-970, Recife, Brazil and 2Universit\u00e9 de Lyon, F-69000, Lyon; Universit\u00e9 Lyon 1; INRIA Rh\u00f4ne-Alpes; CNRS, UMR5558, Laboratoire de Biom\u00e9trie et Biologie Evolutive, F-69622, Villeurbanne, France"},{"name":"1 Centro de Inform\u00e1tica, Universidade Federal de Pernambuco, 50732-970, Recife, Brazil and 2Universit\u00e9 de Lyon, F-69000, Lyon; Universit\u00e9 Lyon 1; INRIA Rh\u00f4ne-Alpes; CNRS, UMR5558, Laboratoire de Biom\u00e9trie et Biologie Evolutive, F-69622, Villeurbanne, France"}]},{"given":"Katia S.","family":"Guimar\u00e3es","sequence":"additional","affiliation":[{"name":"1 Centro de Inform\u00e1tica, Universidade Federal de Pernambuco, 50732-970, Recife, Brazil and 2Universit\u00e9 de Lyon, F-69000, Lyon; Universit\u00e9 Lyon 1; INRIA Rh\u00f4ne-Alpes; CNRS, UMR5558, Laboratoire de Biom\u00e9trie et Biologie Evolutive, F-69622, Villeurbanne, France"}]},{"given":"Marie-France","family":"Sagot","sequence":"additional","affiliation":[{"name":"1 Centro de Inform\u00e1tica, Universidade Federal de Pernambuco, 50732-970, Recife, Brazil and 2Universit\u00e9 de Lyon, F-69000, Lyon; Universit\u00e9 Lyon 1; INRIA Rh\u00f4ne-Alpes; CNRS, UMR5558, Laboratoire de Biom\u00e9trie et Biologie Evolutive, F-69622, Villeurbanne, France"}]}],"member":"286","published-online":{"date-parts":[[2008,8,9]]},"reference":[{"key":"2023020210500835900_B1","first-page":"28","article-title":"Fitting a mixture model by expectation maximization to discover motifs in biopolymers","volume":"2","author":"Bailey","year":"1994","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol"},{"key":"2023020210500835900_B2","first-page":"28","article-title":"Modeling dependencies in protein-DNA binding sites","volume-title":"In Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology (RECOMB\u201903).","author":"Barash","year":"2003"},{"key":"2023020210500835900_B3","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1186\/1471-2105-7-389","article-title":"Fast index based algorithms and software for matching position specific scoring matrices","volume":"7","author":"Beckstette","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023020210500835900_B4","first-page":"38","article-title":"Efficient exact P-value computation and applications to biosequence analysis","volume-title":"In Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology (RECOMB\u201903).","author":"Bejerano","year":"2003"},{"key":"2023020210500835900_B5","volume-title":"Biological sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids.","author":"Durbin","year":"1999"},{"key":"2023020210500835900_B6","doi-asserted-by":"crossref","first-page":"S100","DOI":"10.1093\/bioinformatics\/18.suppl_2.S100","article-title":"Identifying transcription factor binding sites through markov chain optimization","volume":"18","author":"Ellrott","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020210500835900_B7","doi-asserted-by":"crossref","first-page":"490","DOI":"10.1145\/367390.367400","article-title":"Trie memory","volume":"3","author":"Fredkin","year":"1960","journal-title":"Comm. ACM"},{"key":"2023020210500835900_B8","doi-asserted-by":"crossref","first-page":"3585","DOI":"10.1093\/nar\/gkl372","article-title":"Computational identification of transcriptional regulatory elements in DNA sequence","volume":"34","author":"GuhaThakurta","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023020210500835900_B9","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1186\/1471-2105-7-279","article-title":"Optimized mixed markov models for motif identification","volume":"7","author":"Huang","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023020210500835900_B10","article-title":"Sorting and searching","volume-title":"In The Art of Computer Programming, vol. 3 of The Art of Computer Programming.","author":"Knuth","year":"1998","edition":"2nd edn"},{"key":"2023020210500835900_B11","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1126\/science.8211139","article-title":"Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment","volume":"262","author":"Lawrence","year":"1993","journal-title":"Science"},{"key":"2023020210500835900_B12","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-662-07807-5","volume-title":"How to Solve it: Modern Heuristics.","author":"Michalewicz","year":"2004"},{"key":"2023020210500835900_B13","first-page":"239","article-title":"Fast search algorithms for position specific scoring matrices","volume-title":"In Proceedings of the Bioinfomatics Research and Development BIRD 2007, vol. 4414 of Lecture Notes in Bioinformatics.","author":"Pizzi","year":"2007"},{"key":"2023020210500835900_B14","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/1748-7188-2-15","article-title":"Efficient and accurate P-value computation for position weight matrices","volume":"2","author":"Touzet","year":"2007","journal-title":"Algorithms Mol. Biol"},{"key":"2023020210500835900_B15","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1093\/nar\/28.1.316","article-title":"Transfac: an integrated system for gene expression regulation","volume":"28","author":"Wingender","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023020210500835900_B16","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1093\/bioinformatics\/btl662","article-title":"Computing exact P-values for DNA motifs","volume":"23","author":"Zhang","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020210500835900_B17","doi-asserted-by":"crossref","first-page":"894","DOI":"10.1089\/cmb.2005.12.894","article-title":"Finding short DNA motifs using permuted markov models","volume":"12","author":"Zhao","year":"2005","journal-title":"J. Comput. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/16\/i160\/49051605\/bioinformatics_24_16_i160.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/16\/i160\/49051605\/bioinformatics_24_16_i160.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T12:47:16Z","timestamp":1675342036000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/16\/i160\/200465"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8,9]]},"references-count":17,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2008,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn282","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,8,15]]},"published":{"date-parts":[[2008,8,9]]}}}