{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T19:54:52Z","timestamp":1725566092097},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Gene-model curation creates consensus gene models by combining multiple sources of protein-coding evidence that may be incomplete or inconsistent. To date, manual curation still produces the highest quality models. However, manual curation is too slow and costly to be completed even for the most important organisms. In recent years, machine-learned ensemble gene predictors have become a viable alternative to manual curation. Current approaches make use of signal and genomic region consistency among sources and some voting scheme to resolve conflicts in the evidence. As a further step in that direction, we have developed eCRAIG (ensemble CRAIG), an automated curation tool that combines multiple sources of evidence using global discriminative training. This allows efficient integration of different types of genomic evidence with complex statistical dependencies to maximize directly annotation accuracy. Our method goes beyond previous work in integrating novel non-linear annotation agreement features, as well as combinations of intrinsic features of the target sequence and extrinsic annotation features.<\/jats:p><jats:p>Results: We achieved significant improvements over the best ensemble predictors available for Homo sapiens, Caenorhabditis elegans and Arabidopsis thaliana. In particular, eCRAIG achieved a relative mean improvement of 5.1% over Jigsaw, the best published ensemble predictor in all our experiments.<\/jats:p><jats:p>Availability: The source code and datasets are both available at http:\/\/www.seas.upenn.edu\/abernal\/ecraig.tgz<\/jats:p><jats:p>Contact: \u00a0abernal@seas.upenn.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts176","type":"journal-article","created":{"date-parts":[[2012,4,19]],"date-time":"2012-04-19T00:25:22Z","timestamp":1334795122000},"page":"1571-1578","source":"Crossref","is-referenced-by-count":6,"title":["Automated gene-model curation using global discriminative learning"],"prefix":"10.1093","volume":"28","author":[{"given":"Axel","family":"Bernal","sequence":"first","affiliation":[{"name":"1 Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA, 2Department of Electrical Engineering, Technion. Israel Institute of Technology, Haifa 32000, Israel and 3Google Inc. Mountain View, CA, 94043, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Koby","family":"Crammer","sequence":"additional","affiliation":[{"name":"1 Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA, 2Department of Electrical Engineering, Technion. Israel Institute of Technology, Haifa 32000, Israel and 3Google Inc. Mountain View, CA, 94043, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fernando","family":"Pereira","sequence":"additional","affiliation":[{"name":"1 Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA, 2Department of Electrical Engineering, Technion. Israel Institute of Technology, Haifa 32000, Israel and 3Google Inc. Mountain View, CA, 94043, USA"},{"name":"1 Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, USA, 2Department of Electrical Engineering, Technion. Israel Institute of Technology, Haifa 32000, Israel and 3Google Inc. Mountain View, CA, 94043, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2012,4,18]]},"reference":[{"key":"2023012512355305800_B1","doi-asserted-by":"crossref","first-page":"3596","DOI":"10.1093\/bioinformatics\/bti609","article-title":"JIGSAW: integration of multiple sources of evidence for gene prediction","volume":"21","author":"Allen","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512355305800_B2","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1101\/gr.1562804","article-title":"Computational gene prediction using multiple sources of evidence","volume":"14","author":"Allen","year":"2004","journal-title":"Genome Res."},{"issue":"Suppl. 1","key":"2023012512355305800_B3","first-page":"S5.1","article-title":"Pairagon+N-SCAN_EST: a model-based gene annotation pipeline","volume":"7","author":"Arumugam","year":"2006","journal-title":"Genome Biol."},{"key":"2023012512355305800_B4","doi-asserted-by":"crossref","first-page":"i41","DOI":"10.1093\/bioinformatics\/btm229","article-title":"Manual curation is not sufficient for annotation of genomic databases","volume":"23","author":"Baumgartner","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012512355305800_B5","doi-asserted-by":"crossref","first-page":"e54","DOI":"10.1371\/journal.pcbi.0030054","article-title":"Global discriminative learning for higher-accuracy computational gene prediction","volume":"3","author":"Bernal","year":"2007","journal-title":"PLoS Comput. Biol."},{"issue":"Suppl. 1","key":"2023012512355305800_B6","doi-asserted-by":"crossref","first-page":"i57","DOI":"10.1093\/bioinformatics\/bti1040","article-title":"ExonHunter: a comprehensive approach to gene finding","volume":"21","author":"Brejov\u00e1","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512355305800_B7","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/S0959-440X(98)80069-9","article-title":"Finding the genes in genomic DNA","volume":"8","author":"Burge","year":"1998","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012512355305800_B8","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1186\/1471-2105-9-549","article-title":"nGASP\u2013the nematode genome annotation assessment project","volume":"9","author":"Coghlan","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012512355305800_B9","first-page":"551","article-title":"Online passive-aggressive algorithms","volume":"7","author":"Crammer","year":"2006","journal-title":"J. Mach. Learn. R"},{"key":"2023012512355305800_B10","article-title":"Adaptive regularization of weight vectors","author":"Crammer","year":"2009","journal-title":"Proc of NIPS."},{"key":"2023012512355305800_B11","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1145\/1390156.1390190","article-title":"Confidence-weighted linear classification","volume-title":"ICML '08: Proceedings of the 25th International Conference on Machine Learning","author":"Dredze","year":"2008"},{"key":"2023012512355305800_B12","doi-asserted-by":"crossref","first-page":"R13","DOI":"10.1186\/gb-2007-8-1-r13","article-title":"Creating a honey bee consensus gene set","volume":"8","author":"Elsik","year":"2007","journal-title":"Genome Biol."},{"key":"2023012512355305800_B13","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1126\/science.1105136","article-title":"The ENCODE (ENCyclopedia Of DNA Elements) project","volume":"306","author":"ENCODE Project Consortium.","year":"2004","journal-title":"Science"},{"key":"2023012512355305800_B14","doi-asserted-by":"crossref","DOI":"10.1186\/gb-2006-7-s1-s2","article-title":"EGASP'05: ENCODE genome annotation assessment project","volume":"7","author":"Guigo","year":"2006","journal-title":"Genome Biol."},{"key":"2023012512355305800_B15","doi-asserted-by":"crossref","first-page":"1418","DOI":"10.1101\/gr.149502","article-title":"GAZE: a generic framework for the integration of gene-prediction data by dynamic programming","volume":"12","author":"Howe","year":"2002","journal-title":"Genome Res."},{"key":"2023012512355305800_B16","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1186\/1471-2105-4-50","article-title":"Eval: a software package for analysis of genome annotations","volume":"4","author":"Keibler","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023012512355305800_B17","doi-asserted-by":"crossref","first-page":"597","DOI":"10.1093\/bioinformatics\/btn004","article-title":"Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction","volume":"24","author":"Liu","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512355305800_B18","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1093\/bioinformatics\/18.1.19","article-title":"A bayesian framework for combining gene predictions","volume":"18","author":"Pavlovi\u0107","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012512355305800_B19","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1101\/gr.1859804","article-title":"The Ensembl analysis pipeline","volume":"14","author":"Potter","year":"2004","journal-title":"Genome Res."},{"key":"2023012512355305800_B20","first-page":"111","article-title":"EuGene: an eucaryotic gene finder that combines several sources of evidence","author":"Schiex","year":"2001","journal-title":"LNCS 2066"},{"issue":"Suppl. 2","key":"2023012512355305800_B21","doi-asserted-by":"crossref","first-page":"II215","DOI":"10.1093\/bioinformatics\/btg1080","article-title":"Gene prediction with a hidden Markov model and a new intron submodel","volume":"19","author":"Stanke","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012512355305800_B22","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1093\/bioinformatics\/btn013","article-title":"Using native and syntenically mapped cDNA alignments to improve de novo gene finding","volume":"24","author":"Stanke","year":"2008","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/12\/1571\/48883770\/bioinformatics_28_12_1571.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/12\/1571\/48883770\/bioinformatics_28_12_1571.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,22]],"date-time":"2023-06-22T02:45:46Z","timestamp":1687401946000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/12\/1571\/266487"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4,18]]},"references-count":22,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2012,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts176","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,6,15]]},"published":{"date-parts":[[2012,4,18]]}}}