{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T03:32:43Z","timestamp":1769916763832,"version":"3.49.0"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"20","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: 3\u2032 end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 3\u2032 ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic filters have been applied in these cases, they typically result in a high proportion of both false-positive and -negative classifications. Therefore, there is a need to develop improved algorithms to better identify mis-priming events in oligo-dT primed sequences.<\/jats:p>\n               <jats:p>Results: By analyzing sequence features flanking 3\u2032 ends derived from oligo-dT-based sequencing, we developed a na\u00efve Bayes classifier to classify them as true or false\/internally primed. The resulting algorithm is highly accurate, outperforms previous heuristic filters and facilitates identification of novel polyadenylation sites.<\/jats:p>\n               <jats:p>Contact: \u00a0nathan.lawson@umassmed.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt446","type":"journal-article","created":{"date-parts":[[2013,8,21]],"date-time":"2013-08-21T03:07:58Z","timestamp":1377054478000},"page":"2564-2571","source":"Crossref","is-referenced-by-count":30,"title":["Accurate identification of polyadenylation sites from 3\u2032 end deep sequencing using a na\u00efve Bayes classifier"],"prefix":"10.1093","volume":"29","author":[{"given":"Sarah","family":"Sheppard","sequence":"first","affiliation":[{"name":"1 Program in Gene Function and Expression and 2Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 364 Plantation St, Worcester, MA 01605, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nathan D.","family":"Lawson","sequence":"additional","affiliation":[{"name":"1 Program in Gene Function and Expression and 2Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 364 Plantation St, Worcester, MA 01605, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lihua Julie","family":"Zhu","sequence":"additional","affiliation":[{"name":"1 Program in Gene Function and Expression and 2Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 364 Plantation St, Worcester, MA 01605, USA"},{"name":"1 Program in Gene Function and Expression and 2Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 364 Plantation St, Worcester, MA 01605, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2013,8,20]]},"reference":[{"key":"2023012810465439700_btt446-B46","article-title":"Introduction to Machine Learning. Second Edition.","author":"Alpayd\u0131n","year":"2010"},{"key":"2023012810465439700_btt446-B1","first-page":"28","article-title":"Fitting a mixture model by expectation maximization to discover motifs in biopolymers","volume":"2","author":"Bailey","year":"1994","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"2023012810465439700_btt446-B2","doi-asserted-by":"crossref","first-page":"1001","DOI":"10.1101\/gr.10.7.1001","article-title":"Patterns of variant polyadenylation signal usage in human genes","volume":"10","author":"Beaudoing","year":"2000","journal-title":"Genome Res."},{"key":"2023012810465439700_btt446-B3","doi-asserted-by":"crossref","first-page":"3691","DOI":"10.1093\/bioinformatics\/bti589","article-title":"PACdb: PolyA cleavage site and 3\u2032-UTR database","volume":"21","author":"Brockman","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012810465439700_btt446-B4","doi-asserted-by":"crossref","first-page":"1173","DOI":"10.1101\/gr.132563.111","article-title":"A quantitative atlas of polyadenylation in five mammals","volume":"22","author":"Derti","year":"2012","journal-title":"Genome Res."},{"key":"2023012810465439700_btt446-B5","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1101\/gr.115295.110","article-title":"Differential genome-wide profiling of tandem 3\u2032 UTRs among human breast cancer and normal cells by high-throughput sequencing","volume":"21","author":"Fu","year":"2011","journal-title":"Genome Res."},{"key":"2023012810465439700_btt446-B6","doi-asserted-by":"crossref","first-page":"14055","DOI":"10.1073\/pnas.96.24.14055","article-title":"In silico detection of control signals: mRNA 3\u2032-end-processing sequences in diverse species","volume":"96","author":"Graber","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012810465439700_btt446-B7","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1038\/nbt.1633","article-title":"Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs","volume":"28","author":"Guttman","year":"2010","journal-title":"Nat Biotechnol."},{"key":"2023012810465439700_btt446-B8","doi-asserted-by":"crossref","first-page":"6304","DOI":"10.1093\/nar\/gks282","article-title":"Analysis of C. elegans intestinal gene expression and polyadenylation by fluorescence-activated nuclei sorting and 3\u2032-end-seq","volume":"40","author":"Haenni","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012810465439700_btt446-B9","volume-title":"Nonparametric Statistical Methods","author":"Hollander","year":"1999"},{"key":"2023012810465439700_btt446-B10","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1038\/nature12111","article-title":"The zebrafish reference genome sequence and its relationship to the human genome","volume":"496","author":"Howe","year":"2013","journal-title":"Nature"},{"key":"2023012810465439700_btt446-B11","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1038\/nature09616","article-title":"Formation, regulation and evolution of Caenorhabditis elegans 3\u2032UTRs","volume":"469","author":"Jan","year":"2011","journal-title":"Nature"},{"key":"2023012810465439700_btt446-B12","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1002\/aja.1002030302","article-title":"Stages of embryonic development of the zebrafish","volume":"203","author":"Kimmel","year":"1995","journal-title":"Dev. Dyn."},{"key":"2023012810465439700_btt446-B13","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/s10462-007-9052-3","article-title":"Machine learning: a review of classification and combining techniques","volume":"26","author":"Kotsiantis","year":"2006","journal-title":"Artif. Intell. Rev."},{"key":"2023012810465439700_btt446-B14","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol."},{"key":"2023012810465439700_btt446-B15","doi-asserted-by":"crossref","first-page":"1899","DOI":"10.1101\/gr.128488.111","article-title":"Dynamic landscape of tandem 3\u2032 UTRs during zebrafish development","volume":"22","author":"Li","year":"2012","journal-title":"Genome Res."},{"key":"2023012810465439700_btt446-B16","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1093\/nar\/gkl919","article-title":"Systematic variation in mRNA 3\u2032-processing signals during mouse spermatogenesis","volume":"35","author":"Liu","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012810465439700_btt446-B17","first-page":"226","article-title":"Tailing and 3\u2032-end labeling of RNA with yeast poly(A) polymerase and various nucleotides","volume":"4","author":"Martin","year":"1998","journal-title":"RNA"},{"key":"2023012810465439700_btt446-B18","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the predicted and observed secondary structure of T4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"Biochim. Biophys. Acta"},{"key":"2023012810465439700_btt446-B20","doi-asserted-by":"crossref","first-page":"2757","DOI":"10.1093\/nar\/gkp1176","article-title":"Molecular mechanisms of eukaryotic pre-mRNA 3\u2032 end processing regulation","volume":"38","author":"Millevoi","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012810465439700_btt446-B22","first-page":"258","article-title":"Feature selection for unbalanced class distribution and Naive Bayes","volume-title":"Proceedings of the 16th International Conference on Machine Learning","author":"Mladenic","year":"1999"},{"key":"2023012810465439700_btt446-B23","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1016\/j.ceb.2012.12.008","article-title":"All's well that ends well: alternative polyadenylation and its implications for stem cell biology","volume":"25","author":"Mueller","year":"2013","journal-title":"Curr. Opin. Cell Biol."},{"key":"2023012810465439700_btt446-B24","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1016\/S0076-6879(08)02624-4","article-title":"Assays for determining poly(A) tail length and the polarity of mRNA decay in mammalian cells","volume":"448","author":"Murray","year":"2008","journal-title":"Methods Enzymol."},{"key":"2023012810465439700_btt446-B25","doi-asserted-by":"crossref","first-page":"6152","DOI":"10.1073\/pnas.092140899","article-title":"Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription","volume":"99","author":"Nam","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012810465439700_btt446-B26","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1101\/gr.133009.111","article-title":"Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis","volume":"22","author":"Pauli","year":"2012","journal-title":"Genome Res."},{"key":"2023012810465439700_btt446-B27","doi-asserted-by":"crossref","first-page":"1690","DOI":"10.1093\/nar\/29.8.1690","article-title":"Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences: implications for SAGE analysis","volume":"29","author":"Pauws","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012810465439700_btt446-B28","doi-asserted-by":"crossref","first-page":"3836","DOI":"10.1093\/nar\/24.19.3836","article-title":"Searching databases of conserved sequence regions by aligning protein multiple-alignments","volume":"24","author":"Pietrokovski","year":"1996","journal-title":"Nucleic Acids Res."},{"key":"2023012810465439700_btt446-B29","doi-asserted-by":"crossref","first-page":"1770","DOI":"10.1101\/gad.17268411","article-title":"Ending the message: poly(A) signals then and now","volume":"25","author":"Proudfoot","year":"2011","journal-title":"Genes Dev."},{"key":"2023012810465439700_btt446-B30","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1038\/263211a0","article-title":"3\u2032 non-coding region sequences in eukaryotic messenger RNA","volume":"263","author":"Proudfoot","year":"1976","journal-title":"Nature"},{"key":"2023012810465439700_btt446-B31","volume-title":"R: A Language and Environment for Statistical Computing","author":"R Core Team","year":"2013"},{"key":"2023012810465439700_btt446-B32","doi-asserted-by":"crossref","first-page":"5799","DOI":"10.1093\/nar\/18.19.5799","article-title":"Point mutations in AAUAAA and the poly (A) addition site: effects on the accuracy and efficiency of cleavage and polyadenylation in vitro","volume":"18","author":"Sheets","year":"1990","journal-title":"Nucleic Acids Res."},{"key":"2023012810465439700_btt446-B33","doi-asserted-by":"crossref","first-page":"1478","DOI":"10.1101\/gr.114744.110","article-title":"Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing","volume":"21","author":"Shen","year":"2011","journal-title":"Genome Res."},{"key":"2023012810465439700_btt446-B34","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1261\/rna.2581711","article-title":"Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq","volume":"17","author":"Shepard","year":"2011","journal-title":"RNA"},{"key":"2023012810465439700_btt446-B35","doi-asserted-by":"crossref","first-page":"845","DOI":"10.1038\/nsmb.2345","article-title":"Direct sequencing of Arabidopsis thaliana RNA reveals patterns of cleavage and polyadenylation","volume":"19","author":"Sherstnev","year":"2012","journal-title":"Nat. Struct. Mol. Biol."},{"key":"2023012810465439700_btt446-B36","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1016\/j.celrep.2012.01.001","article-title":"Global patterns of tissue-specific alternative polyadenylation in Drosophila","volume":"1","author":"Smibert","year":"2012","journal-title":"Cell Rep."},{"key":"2023012810465439700_btt446-B37","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1093\/nar\/gki158","article-title":"A large-scale analysis of mRNA polyadenylation of human and mouse genes","volume":"33","author":"Tian","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012810465439700_btt446-B38","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1093\/bioinformatics\/btp120","article-title":"TopHat: discovering splice junctions with RNA-Seq","volume":"25","author":"Trapnell","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012810465439700_btt446-B39","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/nbt.1621","article-title":"Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation","volume":"28","author":"Trapnell","year":"2010","journal-title":"Nat. Biotechnol"},{"key":"2023012810465439700_btt446-B40","doi-asserted-by":"crossref","first-page":"1537","DOI":"10.1016\/j.cell.2011.11.055","article-title":"Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution","volume":"147","author":"Ulitsky","year":"2011","journal-title":"Cell"},{"key":"2023012810465439700_btt446-B41","doi-asserted-by":"crossref","first-page":"2054","DOI":"10.1101\/gr.139733.112","article-title":"Extensive alternative polyadenylation during zebrafish development","volume":"22","author":"Ulitsky","year":"2012","journal-title":"Genome Res."},{"key":"2023012810465439700_btt446-B42","volume-title":"The Zebrafish Book: A Guide for the Laboratory Use of Zebrafish (Brachydanio rerio)","author":"Westerfield","year":"1993"},{"key":"2023012810465439700_btt446-B43","doi-asserted-by":"crossref","first-page":"e65","DOI":"10.1093\/nar\/gks1249","article-title":"An efficient method for genome-wide polyadenylation site mapping and RNA quantification","volume":"41","author":"Wilkening","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023012810465439700_btt446-B44","doi-asserted-by":"crossref","first-page":"12533","DOI":"10.1073\/pnas.1019732108","article-title":"Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation","volume":"108","author":"Wu","year":"2011","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012810465439700_btt446-B45","doi-asserted-by":"crossref","first-page":"R100","DOI":"10.1186\/gb-2005-6-12-r100","article-title":"Biased alternative polyadenylation in human tissues","volume":"6","author":"Zhang","year":"2005","journal-title":"Genome Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/20\/2564\/48891723\/bioinformatics_29_20_2564.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/20\/2564\/48891723\/bioinformatics_29_20_2564.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,28]],"date-time":"2023-01-28T12:38:25Z","timestamp":1674909505000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/20\/2564\/277787"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,20]]},"references-count":44,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2013,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt446","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2013,10,15]]},"published":{"date-parts":[[2013,8,20]]}}}