{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,1]],"date-time":"2025-09-01T15:10:10Z","timestamp":1756739410239,"version":"3.44.0"},"reference-count":0,"publisher":"Oxford University Press (OUP)","issue":"Suppl_3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The vast majority of introns in protein-coding genes of higher eukaryotes have a GT dinucleotide at their 5\u2032-terminus and an AG dinucleotide at their 3\u2032 end. About 1\u20132% of introns are non-canonical, with the most abundant subtype of non-canonical introns being characterized by GC and AG dinucleotides at their 5\u2032- and 3\u2032-termini, respectively. Most current gene prediction software, whether based on ab initio or spliced alignment approaches, does not include explicit models for non-canonical introns or may exclude their prediction altogether. With present amounts of genome and transcript data, it is now possible to apply statistical methodology to non-canonical splice site prediction. We pursued one such approach and describe the training and implementation of GC-donor splice site models for Arabidopsis and rice, with the goal of exploring whether specific modeling of non-canonical introns can enhance gene structure prediction accuracy.<\/jats:p>\n               <jats:p>Results: Our results indicate that the incorporation of non-canonical splice site models yields dramatic improvements in annotating genes containing GC\u2013AG and AT\u2013AC non-canonical introns. Comparison of models shows differences between monocot and dicot species, but also suggests GC intron-specific biases independent of taxonomic clade. We also present evidence that GC\u2013AG introns occur preferentially in genes with atypically high exon counts.<\/jats:p>\n               <jats:p>Availability: Source code for the updated versions of GeneSeqer and SplicePredictor (distributed with the GeneSeqer code) isavailable at . Web servers for Arabidopsis, rice and other plant species are accessible at , and , respectively. A SplicePredictor web server is available at . Software to generate training data and parameterizations for Bayesian splice site models is available at<\/jats:p>\n               <jats:p>Contact: \u00a0vbrendel@iastate.edu<\/jats:p>\n               <jats:p>Supporting information: \u00a0<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti1205","type":"journal-article","created":{"date-parts":[[2005,11,23]],"date-time":"2005-11-23T15:38:19Z","timestamp":1132760299000},"page":"iii20-iii30","source":"Crossref","is-referenced-by-count":25,"title":["Incorporation of splice site probability models for non-canonical introns improves gene structure prediction in plants"],"prefix":"10.1093","volume":"21","author":[{"given":"Michael E.","family":"Sparks","sequence":"first","affiliation":[{"name":"Department of Genetics, Development and Cell Biology, Iowa State University 1 \u00a0 1 \u00a0 \u00a0 2112 Molecular Biology Building, Ames, IA 50011-3260, USA"}]},{"given":"Volker","family":"Brendel","sequence":"additional","affiliation":[{"name":"Department of Genetics, Development and Cell Biology, Iowa State University 1 \u00a0 1 \u00a0 \u00a0 2112 Molecular Biology Building, Ames, IA 50011-3260, USA"},{"name":"Department of Statistics, Iowa State University 2 \u00a0 2 \u00a0 \u00a0 2112 Molecular Biology Building, Ames, IA 50011-3260, USA"}]}],"member":"286","published-online":{"date-parts":[[2005,11,1]]},"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/Suppl_3\/iii20\/216696","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/Suppl_3\/iii20\/216696","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,1]],"date-time":"2025-09-01T14:42:14Z","timestamp":1756737734000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/Suppl_3\/iii20\/216696"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,11,1]]},"references-count":0,"journal-issue":{"issue":"Suppl_3","published-print":{"date-parts":[[2005,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti1205","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2005,11]]},"published":{"date-parts":[[2005,11,1]]}}}