{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,14]],"date-time":"2025-12-14T08:17:34Z","timestamp":1765700254562},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The increasing diversity and variable quality of evidence relevant to gene annotation argues for a probabilistic framework that automatically integrates such evidence to yield candidate gene models.<\/jats:p>\n               <jats:p>Results: Evigan is an automated gene annotation program for eukaryotic genomes, employing probabilistic inference to integrate multiple sources of gene evidence. The probabilistic model is a dynamic Bayes network whose parameters are adjusted to maximize the probability of observed evidence. Consensus gene predictions are then derived by maximum likelihood decoding, yielding n-best models (with probabilities for each). Evigan is capable of accommodating a variety of evidence types, including (but not limited to) gene models computed by diverse gene finders, BLAST hits, EST matches, and splice site predictions; learned parameters encode the relative quality of evidence sources. Since separate training data are not required (apart from the training sets used by individual gene finders), Evigan is particularly attractive for newly sequenced genomes where little or no reliable manually curated annotation is available. The ability to produce a ranked list of alternative gene models may facilitate identification of alternatively spliced transcripts. Experimental application to ENCODE regions of the human genome, and the genomes of Plasmodium vivax and Arabidopsis thaliana show that Evigan achieves better performance than any of the individual data sources used as evidence.<\/jats:p>\n               <jats:p>Availability: The source code is available at http:\/\/www.seas.upenn.edu\/~strctlrn\/evigan\/evigan.html<\/jats:p>\n               <jats:p>Contact: \u00a0qianliu@seas.upenn.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn004","type":"journal-article","created":{"date-parts":[[2008,1,11]],"date-time":"2008-01-11T01:13:48Z","timestamp":1200014028000},"page":"597-605","source":"Crossref","is-referenced-by-count":35,"title":["Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction"],"prefix":"10.1093","volume":"24","author":[{"given":"Qian","family":"Liu","sequence":"first","affiliation":[{"name":"1 Department of Computer and Information Science, 2Department of Biology and 3Penn Genomics Institute, University of Pennsylvania, Philadelphia PA 19104, USA"}]},{"given":"Aaron J.","family":"Mackey","sequence":"additional","affiliation":[{"name":"1 Department of Computer and Information Science, 2Department of Biology and 3Penn Genomics Institute, University of Pennsylvania, Philadelphia PA 19104, USA"},{"name":"1 Department of Computer and Information Science, 2Department of Biology and 3Penn Genomics Institute, University of Pennsylvania, Philadelphia PA 19104, USA"}]},{"given":"David S.","family":"Roos","sequence":"additional","affiliation":[{"name":"1 Department of Computer and Information Science, 2Department of Biology and 3Penn Genomics Institute, University of Pennsylvania, Philadelphia PA 19104, USA"},{"name":"1 Department of Computer and Information Science, 2Department of Biology and 3Penn Genomics Institute, University of Pennsylvania, Philadelphia PA 19104, USA"}]},{"given":"Fernando C. N.","family":"Pereira","sequence":"additional","affiliation":[{"name":"1 Department of Computer and Information Science, 2Department of Biology and 3Penn Genomics Institute, University of Pennsylvania, Philadelphia PA 19104, USA"}]}],"member":"286","published-online":{"date-parts":[[2008,1,10]]},"reference":[{"key":"2023020210105505100_B1","doi-asserted-by":"crossref","first-page":"743","DOI":"10.1093\/bioinformatics\/16.8.743","article-title":"gff2ps: visualizing genomic annotations","volume":"16","author":"Abril","year":"2000","journal-title":"Bioinformatics"},{"key":"2023020210105505100_B2","doi-asserted-by":"crossref","DOI":"10.1101\/gr.1562804","article-title":"Computational gene prediction using multiple sources of gene evidence","volume":"14","author":"Allen","year":"2004","journal-title":"Genome Res"},{"key":"2023020210105505100_B3","doi-asserted-by":"crossref","first-page":"3596","DOI":"10.1093\/bioinformatics\/bti609","article-title":"JIGSAW: integration of multiple sources of evidence for gene prediction","volume":"21","author":"Allen","year":"2005","journal-title":"Bioinformatics"},{"issue":"Suppl 1","key":"2023020210105505100_B4","doi-asserted-by":"crossref","first-page":"S9","DOI":"10.1186\/gb-2006-7-s1-s9","article-title":"JIGSAW, GeneZilla and GlimmerHMM: puzzling out the feature of human genes in the ENCODE regions","volume":"7","author":"Allen","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210105505100_B5","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"issue":"Suppl 1","key":"2023020210105505100_B6","doi-asserted-by":"crossref","first-page":"S5","DOI":"10.1186\/gb-2006-7-s1-s5","article-title":"Pairagon+NSCAN_EST: a model-based gene annotation pipeline","volume":"7","author":"Arumugam","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210105505100_B7","doi-asserted-by":"crossref","first-page":"e54","DOI":"10.1371\/journal.pcbi.0030054","article-title":"Global discriminative learning for higher-accuracy computational gene prediction","volume":"3","author":"Bernal","year":"2007","journal-title":"PLoS Computation Biol"},{"issue":"Suppl 1","key":"2023020210105505100_B8","doi-asserted-by":"crossref","first-page":"i57","DOI":"10.1093\/bioinformatics\/bti1040","article-title":"ExonHunter: a comprehensive approach to gene finding","volume":"21","author":"Brejova","year":"2005","journal-title":"Bioinformatics"},{"key":"2023020210105505100_B9","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1006\/jmbi.1997.0951","article-title":"Prediction of complete gene structures in human genomic DNA","volume":"268","author":"Burge","year":"1997","journal-title":"J. Mol. Biol"},{"issue":"Suppl 1","key":"2023020210105505100_B10","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1186\/gb-2006-7-s1-s6","article-title":"Vertebrate gene finding from multiple-species alignments using a two-level strategy","volume":"7","author":"Carter","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210105505100_B11","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/S0166-6851(01)00363-2","article-title":"Phat: a gene finding program for Plasmodium falciparum","volume":"118","author":"Cawley","year":"2001","journal-title":"Mol. Biochem. Parasitol"},{"key":"2023020210105505100_B12","first-page":"33","article-title":"Large multiple organism gene finding by collapsed Gibbs sampling","volume":"99","author":"Chatterji","year":"2005","journal-title":"J. Comput. Biol"},{"key":"2023020210105505100_B13","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btm133","article-title":"Genomix: a method for combining gene-finders predictions, which uses evolutionary conservation of sequence and intron-exon structure","volume":"23","author":"Coghlan","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020210105505100_B14","doi-asserted-by":"crossref","first-page":"942","DOI":"10.1101\/gr.1858004","article-title":"The Ensembl automatic gene annotation system","volume":"14","author":"Curwen","year":"2004","journal-title":"Genome Res"},{"key":"2023020210105505100_B15","first-page":"1","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1997","journal-title":"J. Roy. Stat. Soc., Series B (Methodological)"},{"issue":"Suppl 1","key":"2023020210105505100_B16","doi-asserted-by":"crossref","first-page":"S7","DOI":"10.1186\/gb-2006-7-s1-s7","article-title":"Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA","volume":"7","author":"Djebali","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210105505100_B17","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids.","author":"Durbin","year":"1998"},{"key":"2023020210105505100_B18","doi-asserted-by":"crossref","first-page":"R13","DOI":"10.1186\/gb-2007-8-1-r13","article-title":"Creating a honey bee consensus gene set","volume":"8","author":"Elsik","year":"2007","journal-title":"Genome Biol"},{"key":"2023020210105505100_B19","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1126\/science.1105136","article-title":"The ENCODE (ENCyclopedia Of DNA Elements) project","volume":"306","author":"ENCODE project consortium","year":"2004","journal-title":"Science"},{"issue":"Suppl 1","key":"2023020210105505100_B20","doi-asserted-by":"crossref","first-page":"S8","DOI":"10.1186\/gb-2006-7-s1-s8","article-title":"Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts","volume":"7","author":"Flicek","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210105505100_B21","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1101\/gr.830003","article-title":"Leveraging the mouse genome for gene prediction in human: from whole-genome shotgun reads to a global synteny map","volume":"13","author":"Flicek","year":"2001","journal-title":"Genome Res"},{"key":"2023020210105505100_B22","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1038\/nmeth0805-575","article-title":"EGASP: collaboration through competition to find human genes","volume":"2","author":"Guigo","year":"2005","journal-title":"Nat. Methods"},{"issue":"Suppl 1","key":"2023020210105505100_B23","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/gb-2006-7-s1-s2","article-title":"EGASP: The human ENCODE genome annotation assessment project","volume":"7","author":"Guigo","year":"2006","journal-title":"Genome Biol"},{"key":"2023020210105505100_B24","doi-asserted-by":"crossref","DOI":"10.1186\/gb-2002-3-6-research0029","article-title":"Full-length messenger RNA sequences greatly improve genome annotation","volume":"3","author":"Haas","year":"2002","journal-title":"Genome Biol"},{"key":"2023020210105505100_B25","doi-asserted-by":"crossref","first-page":"1418","DOI":"10.1101\/gr.149502","article-title":"GAZE: a generic framework for the integration of gene-prediction data by dynamic programming","volume":"12","author":"Howe","year":"2002","journal-title":"Genome Res"},{"key":"2023020210105505100_B26","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1006\/geno.1997.4984","article-title":"A tool for analyzing and annotating genomic sequences","volume":"46","author":"Huang","year":"1997","journal-title":"Genomics"},{"volume-title":"Learning in Graphical Models.","year":"1999","author":"Jordan","key":"2023020210105505100_B27"},{"key":"2023020210105505100_B28","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1186\/1471-2105-4-50","article-title":"Eval: a software package for analysis of genome annotations","volume":"4","author":"Keibler","year":"2003","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"2023020210105505100_B29","doi-asserted-by":"crossref","first-page":"S140","DOI":"10.1093\/bioinformatics\/17.suppl_1.S140","article-title":"Integrating genomic homology into gene structuure prediction","volume":"17","author":"Korf","year":"2001","journal-title":"Bioinformatics"},{"key":"2023020210105505100_B30","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1093\/nar\/26.4.1107","article-title":"GeneMark.hmm: new solutions for gene finding","volume":"26","author":"Lukashin","year":"1998","journal-title":"Nucl. Acids Res"},{"key":"2023020210105505100_B31","doi-asserted-by":"crossref","first-page":"2878","DOI":"10.1093\/bioinformatics\/bth315","article-title":"TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders","volume":"20","author":"Majoros","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020210105505100_B32","doi-asserted-by":"crossref","DOI":"10.4269\/ajtmh.2001.64.97","article-title":"The neglected burden of Plasmodium vivax malaria","volume":"64","author":"Mendis","year":"2001","journal-title":"Am. J. Tropical. Med. Hygiene"},{"key":"2023020210105505100_B33","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1093\/bioinformatics\/14.8.665","article-title":"Gene recognition by combination of several gene-finding programs","volume":"14","author":"Murakami","year":"1998","journal-title":"Bioinformatics"},{"article-title":"Dynamic Bayesian Networks: representation, inference and learning","year":"2002","author":"Murphy","key":"2023020210105505100_B34"},{"key":"2023020210105505100_B35","first-page":"467","article-title":"Loopy belief propagation for approximate inference: an empirical study","author":"Murphy","year":"1999"},{"key":"2023020210105505100_B36","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1101\/gr.10.4.511","article-title":"GeneID in Drosophila","volume":"10","author":"Parra","year":"2000","journal-title":"Genome Res"},{"key":"2023020210105505100_B37","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1093\/bioinformatics\/18.1.19","article-title":"A Bayesian framework for combining gene predictions","volume":"18","author":"Pavlovic","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020210105505100_B38","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1093\/nar\/29.5.1185","article-title":"GeneSplicer: a new computational method for splice site prediction","volume":"29","author":"Pertea","year":"2001","journal-title":"Nucl. Acids Res"},{"key":"2023020210105505100_B39","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1023\/A:1013770123580","article-title":"Computational gene finding in plants","volume":"48","author":"Pertea","year":"2002","journal-title":"Plant Mol. Biol"},{"key":"2023020210105505100_B40","first-page":"257","article-title":"A tutorial on hidden Markov models and selected applications in speech recognition","author":"Rabiner","year":"1989"},{"key":"2023020210105505100_B41","doi-asserted-by":"crossref","first-page":"1034","DOI":"10.1093\/bioinformatics\/18.8.1034","article-title":"Improving gene recognition accuracy by combining predictions from two gene-finding programs","volume":"18","author":"Rogic","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020210105505100_B42","first-page":"118","article-title":"Eug'ne, an eukaryotic gene finder that combines several type of evidence","author":"Schiex","year":"2001","journal-title":"Comput. Biol"},{"key":"2023020210105505100_B43","first-page":"81","article-title":"The n-best algorithm: an efficient and exact procedure for finding the n most likely sentence hypotheses","author":"Schwartz","year":"1990"},{"issue":"Suppl 2","key":"2023020210105505100_B44","doi-asserted-by":"crossref","first-page":"II215","DOI":"10.1093\/bioinformatics\/btg1080","article-title":"Gene prediction with a hidden Markov model and a new intron submodel","volume":"19","author":"Stanke","year":"2003","journal-title":"Bioinformatics"},{"issue":"Suppl 1","key":"2023020210105505100_B45","doi-asserted-by":"crossref","first-page":"S11","DOI":"10.1186\/gb-2006-7-s1-s11","article-title":"AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome","volume":"7","author":"Stanke","year":"2006","journal-title":"Genome Biol"},{"issue":"Suppl 1","key":"2023020210105505100_B46","doi-asserted-by":"crossref","first-page":"S10","DOI":"10.1186\/gb-2006-7-s1-s10","article-title":"Automatic annotation of eukaryotic genes, pseudogenes and promoters","volume":"7","author":"Solovyev","year":"2006","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/5\/597\/49051279\/bioinformatics_24_5_597.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/5\/597\/49051279\/bioinformatics_24_5_597.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T11:47:28Z","timestamp":1675338448000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/5\/597\/202036"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,1,10]]},"references-count":46,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2008,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn004","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2008,3,1]]},"published":{"date-parts":[[2008,1,10]]}}}