{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T03:37:08Z","timestamp":1773200228423,"version":"3.50.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1201,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.<\/jats:p><jats:p>Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.<\/jats:p><jats:p>Results: We propose a novel machine-learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we used hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine-tune the classification performance.<\/jats:p><jats:p>We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14 740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of the previous state-of-the-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false-negative rate and false-positive rate by 26, 15 and 35%, respectively. Meanwhile, our method makes \u223c30% fewer error predictions relative to the other string kernels. Furthermore, our method can be used to visualize the importance of oligomers and positions in predicting poly(A) motifs, from which we can observe a number of characteristics in the surrounding regions of true and false motifs that have not been reported before.<\/jats:p><jats:p>Availability: \u00a0http:\/\/sfb.kaust.edu.sa\/Pages\/Software.aspx<\/jats:p><jats:p>Contact: \u00a0lsong@cc.gatech.edu or xin.gao@kaust.edu.sa<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt218","type":"journal-article","created":{"date-parts":[[2013,6,27]],"date-time":"2013-06-27T05:33:26Z","timestamp":1372311206000},"page":"i316-i325","source":"Crossref","is-referenced-by-count":45,"title":["Poly(A) motif prediction using spectral latent features from human DNA sequences"],"prefix":"10.1093","volume":"29","author":[{"given":"Bo","family":"Xie","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Boris R.","family":"Jankovic","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vladimir B.","family":"Bajic","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Le","family":"Song","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2013,6,19]]},"reference":[{"key":"2023062614315558100_btt218-B1","doi-asserted-by":"crossref","first-page":"135","DOI":"10.3233\/ISB-2009-0395","article-title":"Prediction of polyadenylation signals in human DNA sequences using nucleotide frequencies","volume":"9","author":"Ahmed","year":"2009","journal-title":"In Silico Biol"},{"key":"2023062614315558100_btt218-B2","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1186\/1471-2164-11-646","article-title":"Polyar, a new computer program for prediction of poly(a) sites in human sequences","volume":"11","author":"Akhtar","year":"2010","journal-title":"BMC Genomics"},{"key":"2023062614315558100_btt218-B3","doi-asserted-by":"crossref","first-page":"1001","DOI":"10.1101\/gr.10.7.1001","article-title":"Patterns of variant polyadenylation signal usage in human genes","volume":"10","author":"Beaudoing","year":"2000","journal-title":"Genome Res."},{"key":"2023062614315558100_btt218-B4","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1016\/0968-0004(89)90011-X","article-title":"Poly(a), poly(a) binding protein and the regulation of mRNA stability","volume":"14","author":"Bernstein","year":"1989","journal-title":"Trends Biochem. Sci."},{"key":"2023062614315558100_btt218-B5","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1007\/s11517-011-0732-4","article-title":"Characterization and prediction of mRNA polyadenylation sites in human genes","volume":"49","author":"Chang","year":"2011","journal-title":"Med. Biol. Eng. Comput."},{"key":"2023062614315558100_btt218-B6","doi-asserted-by":"crossref","first-page":"2320","DOI":"10.1093\/bioinformatics\/btl394","article-title":"Prediction of mRNA polyadenylation sites by support vector machine","volume":"22","author":"Cheng","year":"2006","journal-title":"Bioinformatics"},{"key":"2023062614315558100_btt218-B7","doi-asserted-by":"crossref","first-page":"2346","DOI":"10.1128\/jvi.71.3.2346-2356.1997","article-title":"A conserved hairpin motif in the r-u5 region of the human immunodeficiency virus type 1 RNA genome is essential for replication","volume":"71","author":"Das","year":"1997","journal-title":"J. Virol."},{"key":"2023062614315558100_btt218-B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. B"},{"key":"2023062614315558100_btt218-B9","doi-asserted-by":"crossref","first-page":"14055","DOI":"10.1073\/pnas.96.24.14055","article-title":"In silico detection of control signals: mRNA 3\u2032-end-processing sequences in diverse species","volume":"96","author":"Graber","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023062614315558100_btt218-B10","doi-asserted-by":"crossref","first-page":"1460","DOI":"10.1016\/j.jcss.2011.12.025","article-title":"A spectral algorithm for learning hidden Markov models","volume":"78","author":"Hsu","year":"2012","journal-title":"J. Comput. Syst. Sci."},{"key":"2023062614315558100_btt218-B11","doi-asserted-by":"crossref","first-page":"1485","DOI":"10.1261\/rna.2107305","article-title":"Bioinformatic identification of candidate cis-regulatory elements involved in human mrna polyadenylation","volume":"11","author":"Hu","year":"2005","journal-title":"RNA"},{"key":"2023062614315558100_btt218-B12","first-page":"819","article-title":"Probability product kernels","volume":"5","author":"Jebara","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"2023062614315558100_btt218-B13","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1016\/j.jtbi.2010.05.015","article-title":"A classification-based prediction model of messenger rna polyadenylation sites","volume":"265","author":"Ji","year":"2010","journal-title":"J. Theor. Biol."},{"key":"2023062614315558100_btt218-B14","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/btt161","article-title":"Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences","author":"Kalkatawi","year":"2013","journal-title":"Bioinformatics"},{"key":"2023062614315558100_btt218-B15","doi-asserted-by":"crossref","first-page":"1503","DOI":"10.1016\/S0002-9440(10)62576-X","article-title":"Polya deletions in hereditary nonpolyposis colorectal cancer: mutations before a gatekeeper","volume":"160","author":"Kim","year":"2002","journal-title":"Am. J. Pathol."},{"key":"2023062614315558100_btt218-B16","doi-asserted-by":"crossref","first-page":"4035","DOI":"10.1038\/emboj.2012.252","article-title":"A complex immunodeficiency is based on u1 snrnp-mediated poly(a) site suppression","volume":"31","author":"Langemeier","year":"2012","journal-title":"EMBO J."},{"key":"2023062614315558100_btt218-B17","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/1471-2164-4-7","article-title":"Sequence determinants in human polyadenylation site selection","volume":"4","author":"Legendre","year":"2003","journal-title":"BMC Genomics"},{"key":"2023062614315558100_btt218-B18","author":"Leslie","year":"2002"},{"key":"2023062614315558100_btt218-B19","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1093\/bioinformatics\/btg431","article-title":"Mismatch string kernels for discriminative protein classification","volume":"20","author":"Leslie","year":"2004","journal-title":"Bioinformatics"},{"key":"2023062614315558100_btt218-B20","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1016\/j.molcel.2011.04.015","article-title":"Poly(adp-ribose) regulates stress responses and microrna activity in the cytoplasm","volume":"42","author":"Leung","year":"2011","journal-title":"Mol. Cell"},{"key":"2023062614315558100_btt218-B21","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1093\/bioinformatics\/bth437","article-title":"Dnafsminer: a web-based software toolbox to recognize two types of functional sites in dna sequences","volume":"21","author":"Liu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023062614315558100_btt218-B22","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1093\/nar\/26.4.1107","article-title":"Genemark.hmm: new solutions for gene finding","volume":"26","author":"Lukashin","year":"1998","journal-title":"Nucleic Acids Res."},{"key":"2023062614315558100_btt218-B23","article-title":"A spectral algorithm for latent junction trees","volume-title":"Uncertainty in Artificial Intelligence","author":"Parikh","year":"2012"},{"key":"2023062614315558100_btt218-B24","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1038\/sj.ejhg.5201517","article-title":"Stability of bat26 in tumours of hereditary nonpolyposis colorectal cancer patients with msh2 intragenic deletion","volume":"14","author":"Pastrello","year":"2006","journal-title":"Eur. J. Hum. Genet."},{"key":"2023062614315558100_btt218-B25","doi-asserted-by":"crossref","first-page":"1770","DOI":"10.1101\/gad.17268411","article-title":"Ending the message: poly(a) signals then and now","volume":"25","author":"Proudfoot","year":"2011","journal-title":"Genes Dev."},{"key":"2023062614315558100_btt218-B26","doi-asserted-by":"crossref","first-page":"277","DOI":"10.7551\/mitpress\/4057.003.0018","article-title":"Accurate splice site detection for caenorhabditis elegans","volume-title":"Kernel Methods in Computational Biology","author":"R\u00e4tsch","year":"2004"},{"key":"2023062614315558100_btt218-B27","doi-asserted-by":"crossref","first-page":"i369","DOI":"10.1093\/bioinformatics\/bti1053","article-title":"Rase: recognition of alternatively spliced exons in c. elegans","volume":"21","author":"R\u00e4tsch","year":"2005","journal-title":"Bioinformatics"},{"key":"2023062614315558100_btt218-B28","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1186\/1471-2164-7-176","article-title":"Similarities and differences of polyadenylation signals in human and fly","volume":"7","author":"Retelska","year":"2006","journal-title":"BMC Genomics"},{"key":"2023062614315558100_btt218-B29","first-page":"23","article-title":"Recognition of 3\u2032-processing sites of human mrna precursors","volume":"13","author":"Salamov","year":"1997","journal-title":"Comput. Appl. Biosci."},{"key":"2023062614315558100_btt218-B30","doi-asserted-by":"crossref","first-page":"e472","DOI":"10.1093\/bioinformatics\/btl250","article-title":"Arts: accurate recognition of transcription starts in human","volume":"22","author":"Sonnenburg","year":"2006","journal-title":"Bioinformatics"},{"key":"2023062614315558100_btt218-B31","doi-asserted-by":"crossref","first-page":"S7","DOI":"10.1186\/1471-2105-8-S10-S7","article-title":"Accurate splice site prediction using support vector machines","volume":"8","author":"Sonnenburg","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023062614315558100_btt218-B32","doi-asserted-by":"crossref","first-page":"i6","DOI":"10.1093\/bioinformatics\/btn170","article-title":"POIMs: positional oligomer importance matrices\u2013understanding support vector machine-based signal detectors","volume":"24","author":"Sonnenburg","year":"2008","journal-title":"Bioinformatics"},{"key":"2023062614315558100_btt218-B33","doi-asserted-by":"crossref","first-page":"ii215","DOI":"10.1093\/bioinformatics\/btg1080","article-title":"Gene prediction with a hidden Markov model and a new intron submodel","volume":"19","author":"Stanke","year":"2003","journal-title":"Bioinformatics"},{"key":"2023062614315558100_btt218-B34","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/S0378-1119(99)00104-3","article-title":"Detection of polyadenylation signals in human DNA sequences","volume":"231","author":"Tabaska","year":"1999","journal-title":"Gene"},{"key":"2023062614315558100_btt218-B35","doi-asserted-by":"crossref","first-page":"1000","DOI":"10.1093\/nar\/28.4.1000","article-title":"Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals","volume":"28","author":"van Helden","year":"2000","journal-title":"Nucleic Acids Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/13\/i316\/50703047\/bioinformatics_29_13_i316.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/13\/i316\/50703047\/bioinformatics_29_13_i316.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,13]],"date-time":"2024-05-13T02:35:40Z","timestamp":1715567740000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/13\/i316\/191301"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,6,19]]},"references-count":35,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2013,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt218","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2013,7]]},"published":{"date-parts":[[2013,6,19]]}}}