{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,3]],"date-time":"2024-08-03T20:19:14Z","timestamp":1722716354301},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog.<\/jats:p>\n               <jats:p>Results: We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable.<\/jats:p>\n               <jats:p>Availability: The source code of all PRISM models, data and additional scripts are freely available for download at: http:\/\/github.com\/somork\/codonhmm.<\/jats:p>\n               <jats:p>Contact: \u00a0soer@ruc.dk<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr698","type":"journal-article","created":{"date-parts":[[2012,1,4]],"date-time":"2012-01-04T05:38:22Z","timestamp":1325655502000},"page":"636-642","source":"Crossref","is-referenced-by-count":11,"title":["Evaluating bacterial gene-finding HMM structures as probabilistic logic programs"],"prefix":"10.1093","volume":"28","author":[{"given":"S\u00f8ren","family":"M\u00f8rk","sequence":"first","affiliation":[{"name":"1 Department of Science, Systems and Models, Roskilde University, 4000 Roskilde, Denmark and 2Department of Bioengineering, University of California, Berkeley, CA 94720, USA"}]},{"given":"Ian","family":"Holmes","sequence":"additional","affiliation":[{"name":"1 Department of Science, Systems and Models, Roskilde University, 4000 Roskilde, Denmark and 2Department of Bioengineering, University of California, Berkeley, CA 94720, USA"}]}],"member":"286","published-online":{"date-parts":[[2012,1,3]]},"reference":[{"key":"2023012512191747500_B1","doi-asserted-by":"crossref","first-page":"3911","DOI":"10.1093\/nar\/27.19.3911","article-title":"Heuristic approach to deriving models for gene finding","volume":"27","author":"Besemer","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B2","doi-asserted-by":"crossref","first-page":"2607","DOI":"10.1093\/nar\/29.12.2607","article-title":"GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions","volume":"29","author":"Besemer","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B3","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1126\/science.277.5331.1453","article-title":"The complete genome sequence of Escherichia coli K-12","volume":"277","author":"Blattner","year":"1997","journal-title":"Science"},{"key":"2023012512191747500_B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0166-5316(03)00044-0","article-title":"Acyclic discrete phase type distributions: properties and a parameter estimation algorithm","volume":"54","author":"Bobbio","year":"2003","journal-title":"Perform. Eval."},{"key":"2023012512191747500_B5","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/0097-8485(93)85004-V","article-title":"GENMARK: parallel gene recognition for both DNA strands","volume":"17","author":"Borodovsky","year":"1993","journal-title":"Comput. Chem."},{"key":"2023012512191747500_B6","doi-asserted-by":"crossref","first-page":"3258","DOI":"10.1093\/bioinformatics\/btm402","article-title":"Transducers: an emerging probabilistic framework for modeling indels on trees","volume":"23","author":"Bradley","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012512191747500_B7","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1006\/jmbi.1997.0951","article-title":"Prediction of complete gene structures in human genomic dna","volume":"268","author":"Burge","year":"1997","journal-title":"J. Mol. Biol."},{"key":"2023012512191747500_B8","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1007\/978-3-540-73499-4_56","article-title":"A machine learning approach to test data generation: a case study in evaluation of gene finders","volume-title":"Machine Learning and Data Mining in Pattern Recognition.","author":"Christiansen","year":"2007"},{"key":"2023012512191747500_B9","first-page":"28","article-title":"Taming the zoo of discrete HMM subspecies & some of their relatives","volume-title":"Biology, Computation and Linguistics, New Interdisciplinary Paradigms","author":"Christiansen","year":"2011"},{"key":"2023012512191747500_B10","doi-asserted-by":"crossref","first-page":"4636","DOI":"10.1093\/nar\/27.23.4636","article-title":"Improved microbial gene identification with GLIMMER","volume":"27","author":"Delcher","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B11","doi-asserted-by":"crossref","first-page":"837","DOI":"10.2307\/2531595","article-title":"Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach","volume":"44","author":"DeLong","year":"1988","journal-title":"Biometrics"},{"key":"2023012512191747500_B12","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis.","author":"Durbin","year":"1998"},{"key":"2023012512191747500_B13","doi-asserted-by":"crossref","first-page":"6441","DOI":"10.1093\/nar\/20.24.6441","article-title":"Assessment of protein coding measures","volume":"20","author":"Fickett","year":"1992","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B14","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1023\/A:1007425814087","article-title":"Factorial hidden Markov models","volume":"29","author":"Ghahramani","year":"1996","journal-title":"Mach. Learn."},{"key":"2023012512191747500_B15","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1089\/cmb.1997.4.127","article-title":"Finding genes in DNA with a Hidden Markov Model","volume":"4","author":"Henderson","year":"1997","journal-title":"J. Comp. Biol."},{"key":"2023012512191747500_B16","doi-asserted-by":"crossref","first-page":"012015","DOI":"10.1088\/1742-6596\/95\/1\/012015","article-title":"Deterministic annealing variant of variational Bayes method","volume":"95","author":"Katahira","year":"2008","journal-title":"J. Phys. Conf."},{"issue":"Suppl. 1","key":"2023012512191747500_B17","doi-asserted-by":"crossref","first-page":"D464","DOI":"10.1093\/nar\/gkn751","article-title":"EcoCyc: a comprehensive view of Escherichia coli biology","volume":"37","author":"Keseler","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B18","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1186\/1471-2105-5-59","article-title":"Gene finding in novel genomes","volume":"5","author":"Korf","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012512191747500_B19","first-page":"179","article-title":"Two methods for improving performance of an hmm and their application for gene finding","volume":"5","author":"Krogh","year":"1997","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"2023012512191747500_B20","doi-asserted-by":"crossref","first-page":"4768","DOI":"10.1093\/nar\/22.22.4768","article-title":"A hidden Markov model that finds genes in E.coli DNA","volume":"22","author":"Krogh","year":"1994","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B21","doi-asserted-by":"crossref","first-page":"1501","DOI":"10.1006\/jmbi.1994.1104","article-title":"Hidden Markov Models in computational biology : applications to protein modeling","volume":"235","author":"Krogh","year":"1994","journal-title":"J. Mol. Biol."},{"key":"2023012512191747500_B22","article-title":"A generalized hidden markov model for the recognition of human genes in dna","volume-title":"Proceedings of the Fourth International Conference on Intelligent System for Molecular Biology.","author":"Kulp","year":"1996"},{"key":"2023012512191747500_B23","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1038\/36786","article-title":"The complete genome sequence of the gram-positive bacterium bacillus subtilis","volume":"390","author":"Kunst","year":"1997","journal-title":"Nature"},{"key":"2023012512191747500_B24","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1186\/1471-2105-4-21","article-title":"Easygene - a prokaryotic gene finder that ranks orfs by statistical significance","volume":"4","author":"Larsen","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023012512191747500_B25","doi-asserted-by":"crossref","first-page":"6494","DOI":"10.1093\/nar\/gki937","article-title":"Gene identification in novel eukaryotic genomes by self-training algorithm","volume":"33","author":"Lomsadze","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B26","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1093\/nar\/26.4.1107","article-title":"GeneMark.hmm: new solutions for gene finding","volume":"26","author":"Lukashin","year":"1998","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B27","doi-asserted-by":"crossref","first-page":"3601","DOI":"10.1093\/nar\/gkg527","article-title":"GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders","volume":"31","author":"Majoros","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B28","doi-asserted-by":"crossref","first-page":"2878","DOI":"10.1093\/bioinformatics\/bth315","article-title":"TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders","volume":"20","author":"Majoros","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012512191747500_B29","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1186\/1471-2105-7-263","article-title":"Automatic generation of gene finders for eukaryotic species","volume":"7","author":"Munch","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012512191747500_B30","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A tutorial on hidden markov models and selected applications in speech recognition","volume":"77","author":"Rabiner","year":"1989","journal-title":"Proc. IEEE"},{"key":"2023012512191747500_B31","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1101\/gr.10.4.529","article-title":"Genie, gene finding in Drosophila melanogaster","volume":"10","author":"Reese","year":"2000","journal-title":"Genome Res."},{"key":"2023012512191747500_B32","doi-asserted-by":"crossref","first-page":"544","DOI":"10.1093\/nar\/26.2.544","article-title":"Microbial gene identification using interpolated Markov models","volume":"26","author":"Salzberg","year":"1998","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B33","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1613\/jair.912","article-title":"Parameter learning of logic programs for symbolic-statistical modeling","volume":"15","author":"Sato","year":"2001","journal-title":"J. Artif. Intell. Res."},{"key":"2023012512191747500_B34","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1007\/s10472-009-9135-8","article-title":"Variational Bayes via propositionalized probability computation in PRISM","volume":"54","author":"Sato","year":"2008","journal-title":"Ann. Math. Artif. Intell."},{"key":"2023012512191747500_B35","author":"Sato","year":"2010","journal-title":"PRISM User Manual (Version 2.0)."},{"key":"2023012512191747500_B36","first-page":"24","article-title":"Generative modeling by PRISM","volume-title":"ICLP","author":"Sato","year":"2009"},{"key":"2023012512191747500_B37","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1023\/A:1007649326333","article-title":"Mixed memory markov models: Decomposing complex stochastic processes as mixtures of simpler ones","volume":"37","author":"Saul","year":"1999","journal-title":"Mach. Learn."},{"key":"2023012512191747500_B38","first-page":"341","article-title":"Automata-theoretic models of mutation and alignment","volume":"3","author":"Searls","year":"1995","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"2023012512191747500_B39","doi-asserted-by":"crossref","first-page":"874","DOI":"10.1093\/bioinformatics\/15.11.874","article-title":"Finding prokaryotic genes by the frame-by-frame' algorithm: targeting gene starts and overlapping genes","volume":"15","author":"Shmatkov","year":"1999","journal-title":"Bioinformatics"},{"key":"2023012512191747500_B40","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1093\/nar\/10.1.141","article-title":"Codon preference and its use in identifying protein coding regions in long DNA sequences","volume":"10","author":"Staden","year":"1982","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B41","doi-asserted-by":"crossref","first-page":"505","DOI":"10.1093\/nar\/12.1Part2.505","article-title":"Computer methods to locate signals in nucleic acid sequences","volume":"12","author":"Staden","year":"1984","journal-title":"Nucleic Acids Res."},{"key":"2023012512191747500_B42","first-page":"369","article-title":"Optimally parsing a sequence into different classes based on multiple types of information","volume-title":"Proceedings of Second International Conference on Intelligent Systems for Molecular Biology.","author":"Stormo","year":"1994"},{"key":"2023012512191747500_B43","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1016\/S0893-6080(97)00133-0","article-title":"Deterministic annealing em algorithm","volume":"11","author":"Ueda","year":"1998","journal-title":"Neural Netw."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/5\/636\/48874933\/bioinformatics_28_5_636.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/5\/636\/48874933\/bioinformatics_28_5_636.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T15:21:48Z","timestamp":1674660108000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/5\/636\/246975"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,1,3]]},"references-count":43,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2012,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr698","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,3,1]]},"published":{"date-parts":[[2012,1,3]]}}}