{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,2]],"date-time":"2025-02-02T21:40:04Z","timestamp":1738532404783,"version":"3.35.0"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,1,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: The modeling of conservation patterns in genomic DNA has become increasingly popular for a number of bioinformatic applications. While several systems developed to date incorporate context-dependence in their substitution models, the impact on computational complexity and generalization ability of the resulting higher order models invites the question of whether simpler approaches to context modeling might permit appreciable reductions in model complexity and computational cost, without sacrificing prediction accuracy.<\/jats:p><jats:p>Results: We formulate several alternative methods for context modeling based on windowed Bayesian networks, and compare their effects on both accuracy and computational complexity for the task of discriminating functionally distinct segments in vertebrate DNA. Our results show that substantial reductions in the complexity of both the model and the associated inference algorithm can be achieved without reducing predictive accuracy.<\/jats:p><jats:p>Contact: \u00a0bmajoros@duke.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn598","type":"journal-article","created":{"date-parts":[[2008,11,19]],"date-time":"2008-11-19T03:17:57Z","timestamp":1227064677000},"page":"175-182","source":"Crossref","is-referenced-by-count":1,"title":["Complexity reduction in context-dependent DNA substitution models"],"prefix":"10.1093","volume":"25","author":[{"given":"William H.","family":"Majoros","sequence":"first","affiliation":[{"name":"1 Institute for Genome Sciences & Policy and 2Department of Biostatistics & Bioinformatics, Institute for Genome Sciences & Policy, Duke University, Durham, NC, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Uwe","family":"Ohler","sequence":"additional","affiliation":[{"name":"1 Institute for Genome Sciences & Policy and 2Department of Biostatistics & Bioinformatics, Institute for Genome Sciences & Policy, Duke University, Durham, NC, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2008,11,18]]},"reference":[{"key":"2023013110003374200_B1","doi-asserted-by":"crossref","first-page":"2322","DOI":"10.1093\/bioinformatics\/bti376","article-title":"Identification and measurement of neighbor-dependent nucleotide substitution processes","volume":"21","author":"Arndt","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110003374200_B2","doi-asserted-by":"crossref","first-page":"1283","DOI":"10.1126\/science.287.5456.1283","article-title":"Evidence for a high frequency of simultaneous double-nucleotide substitutions","volume":"287","author":"Averof","year":"2000","journal-title":"Science"},{"key":"2023013110003374200_B3","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1101\/gr.1960404","article-title":"MAVID: constrained ancestral alignment of multiple sequences","volume":"14","author":"Bray","year":"2004","journal-title":"Genome Res"},{"key":"2023013110003374200_B4","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis.","author":"Durbin","year":"1998"},{"key":"2023013110003374200_B5","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1126\/science.1105136","article-title":"The ENCODE (ENCyclopedia Of DNA Elements) project","volume":"306","author":"The ENCODE Project Consortium","year":"2004","journal-title":"Science"},{"key":"2023013110003374200_B6","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1007\/BF01734359","article-title":"Evolutionary trees from DNA sequences","volume":"17","author":"Felsenstein","year":"1981","journal-title":"J. Mol. Evol"},{"key":"2023013110003374200_B7","doi-asserted-by":"crossref","first-page":"406","DOI":"10.2307\/2412116","article-title":"Toward defining the course of evolution: minimum change for a specific tree topology","volume":"20","author":"Fitch","year":"1971","journal-title":"Syst. Zool"},{"key":"2023013110003374200_B8","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1126\/science.1094068","article-title":"Inferring cellular networks using probabilistic graphical models","volume":"303","author":"Friedman","year":"2004","journal-title":"Science"},{"key":"2023013110003374200_B9","first-page":"725","article-title":"A codon-based model of nucleotide substitution for protein-coding DNA sequences","volume":"11","author":"Goldman","year":"1994","journal-title":"Mol. Biol. Evol"},{"key":"2023013110003374200_B10","first-page":"374","article-title":"Using multiple alignments to improve gene prediction","volume-title":"Lecture Notes in Computer Science","author":"Gross","year":"2005"},{"key":"2023013110003374200_B11","first-page":"350","article-title":"Using multiple alignments and phylogenetic trees to detect RNA secondary structure","author":"Gulko","year":"1996"},{"key":"2023013110003374200_B12","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/gb-2006-7-s1-s4","article-title":"GENCODE: producing a reference annotation for ENCODE.","volume":"7","author":"Harrow","year":"2006","journal-title":"Genome Biol"},{"key":"2023013110003374200_B13","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1007\/BF02101694","article-title":"Dating of the human-ape splitting by a molecular clock of mitochondrial DNA","volume":"22","author":"Hasegawa","year":"1985","journal-title":"J. Mol. Evol"},{"key":"2023013110003374200_B14","first-page":"301","article-title":"A tutorial on learning with Bayesian networks","volume-title":"Learning in Graphical Models.","author":"Heckerman","year":"1999"},{"key":"2023013110003374200_B15","doi-asserted-by":"crossref","first-page":"13994","DOI":"10.1073\/pnas.0404142101","article-title":"Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution","volume":"101","author":"Hwang","year":"2004","journal-title":"PNAS"},{"key":"2023013110003374200_B16","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1239\/aap\/1013540176","article-title":"Probabilistic models of DNA sequence evolution with context dependent rates of substitution","volume":"32","author":"Jensen","year":"2000","journal-title":"Adv. Appl. Prob"},{"key":"2023013110003374200_B17","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1093\/bioinformatics\/bth917","article-title":"Efficient approximations for learning phylogenetic HMM models from data","volume":"20","author":"Jojic","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013110003374200_B18","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/B978-1-4832-3211-9.50009-7","article-title":"Evolution of protein molecules","volume-title":"Mammalian protein metabolism.","author":"Jukes","year":"1969"},{"key":"2023013110003374200_B19","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1101\/gr.229102","article-title":"The human genome browser at UCSC","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res."},{"key":"2023013110003374200_B20","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1109\/18.910572","article-title":"Factor graphs and the sum-product algorithm","volume":"47","author":"Kschischang","year":"2001","journal-title":"IEEE Trans. Inf. Theory"},{"key":"2023013110003374200_B21","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1007\/BF01731581","article-title":"A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences","volume":"16","author":"Kimura","year":"1980","journal-title":"J. Mol. Evol."},{"key":"2023013110003374200_B22","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1111\/j.2517-6161.1988.tb01721.x","article-title":"Local computations with probabilities on graphical structures and their application to expert systems","volume":"50","author":"Lauritzen","year":"1988","journal-title":"J. R. Statist. Soc. B"},{"key":"2023013110003374200_B23","doi-asserted-by":"crossref","first-page":"1850","DOI":"10.1093\/bioinformatics\/bth153","article-title":"Multiple-sequence functional annotation and the generalized hidden Markov phylogeny","volume":"20","author":"McAuliffe","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013110003374200_B24","first-page":"324","article-title":"Phylogenetic motif detection by expectation-maximization on evolutionary mixtures","author":"Moses","year":"2004"},{"key":"2023013110003374200_B25","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1093\/bioinformatics\/15.5.362","article-title":"Interpolated Markov chains for eukaryotic promoter recognition","volume":"5","author":"Ohler","year":"1999","journal-title":"Bioinformatics"},{"volume-title":"Probabilistic Reasoning in Intelligent Systems.","year":"1988","author":"Pearl","key":"2023013110003374200_B26"},{"key":"2023013110003374200_B27","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1093\/bioinformatics\/19.2.219","article-title":"Gene finding with a hidden Markov model of genome structure and evolution","volume":"19","author":"Pedersen","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013110003374200_B28","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/S0168-9525(00)02208-3","article-title":"Restricted wobble rules for eukaryotic genomes","volume":"17","author":"Percudani","year":"2001","journal-title":"Trends Genet"},{"key":"2023013110003374200_B29","first-page":"406","article-title":"The neighbor-joining method: a new method for reconstructing phylogenetic trees","volume":"4","author":"Saitou","year":"1987","journal-title":"Mol. Biol. Evol."},{"key":"2023013110003374200_B30","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1006\/geno.1999.5854","article-title":"Interpolated Markov models for eukaryotic gene finding","volume":"59","author":"Salzberg","year":"1998","journal-title":"Genomics"},{"key":"2023013110003374200_B31","doi-asserted-by":"crossref","first-page":"1534","DOI":"10.1093\/oxfordjournals.molbev.a004216","article-title":"Codon and rate variation models in molecular phylogeny","volume":"19","author":"Schadt","year":"2002","journal-title":"Mol. Biol. Evol"},{"key":"2023013110003374200_B32","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1093\/molbev\/msj021","article-title":"Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences","volume":"23","author":"Shapiro","year":"2006","journal-title":"Mol. Biol. Evol"},{"key":"2023013110003374200_B33","doi-asserted-by":"crossref","first-page":"e67","DOI":"10.1371\/journal.pcbi.0010067","article-title":"PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny","volume":"1","author":"Siddharthan","year":"2005","journal-title":"PLoS Comp. Biol"},{"key":"2023013110003374200_B34","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1089\/1066527041410472","article-title":"Combining phylogenetic and hidden Markov models in biosequence analysis","volume":"11","author":"Siepel","year":"2004","journal-title":"J. Comp. Biol"},{"key":"2023013110003374200_B35","doi-asserted-by":"crossref","first-page":"468","DOI":"10.1093\/molbev\/msh039","article-title":"Phylogenetic estimation of context-dependent substitution rates by maximum likelihood","volume":"21","author":"Siepel","year":"2004","journal-title":"Mol. Biol. Evol"},{"key":"2023013110003374200_B36","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1093\/molbev\/msg003","article-title":"A low rate of simultaneous double-nucleotide mutations in primates","volume":"20","author":"Smith","year":"2003","journal-title":"Mol. Biol. Evol"},{"key":"2023013110003374200_B37","first-page":"57","article-title":"Some probabilistic and statistical problems in the analysis of DNA sequences","volume":"17","author":"Tavar\u00e9","year":"1986","journal-title":"Lect. Math. Life Sci"},{"key":"2023013110003374200_B38","doi-asserted-by":"crossref","first-page":"1596","DOI":"10.1101\/gr.4537706","article-title":"ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements","volume":"16","author":"Taylor","year":"2006","journal-title":"Genome Res."},{"key":"2023013110003374200_B39","doi-asserted-by":"crossref","first-page":"2027","DOI":"10.1534\/genetics.103.023226","article-title":"Estimating the frequency of events that cause multiple-nucleotide changes","volume":"167","author":"Whelan","year":"2004","journal-title":"Genetics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/2\/175\/48982820\/bioinformatics_25_2_175.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/2\/175\/48982820\/bioinformatics_25_2_175.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,2]],"date-time":"2025-02-02T21:07:44Z","timestamp":1738530464000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/2\/175\/218526"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,11,18]]},"references-count":39,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2009,1,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn598","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2009,1,15]]},"published":{"date-parts":[[2008,11,18]]}}}