{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T11:39:33Z","timestamp":1767181173985,"version":"build-2238731810"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1008297","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,11,17]],"date-time":"2020-11-17T00:00:00Z","timestamp":1605571200000}}],"reference-count":52,"publisher":"Public Library of Science (PLoS)","issue":"11","license":[{"start":{"date-parts":[[2020,11,5]],"date-time":"2020-11-05T00:00:00Z","timestamp":1604534400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research","award":["URF\/1\/2602-01"],"award-info":[{"award-number":["URF\/1\/2602-01"]}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["P41 GM103712"],"award-info":[{"award-number":["P41 GM103712"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research","award":["URF\/1\/3007-01"],"award-info":[{"award-number":["URF\/1\/3007-01"]}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01GM134020"],"award-info":[{"award-number":["R01GM134020"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DBI-1949629"],"award-info":[{"award-number":["DBI-1949629"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-2007595"],"award-info":[{"award-number":["IIS-2007595"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>\n                    In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the\n                    <jats:italic>cis<\/jats:italic>\n                    -determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1008297","type":"journal-article","created":{"date-parts":[[2020,11,5]],"date-time":"2020-11-05T13:55:57Z","timestamp":1604584557000},"page":"e1008297","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":12,"title":["Poly(A)-DG: A deep-learning-based domain generalization method to identify cross-species Poly(A) signal without prior knowledge from target species"],"prefix":"10.1371","volume":"16","author":[{"given":"Yumin","family":"Zheng","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1826-4069","authenticated-orcid":true,"given":"Haohan","family":"Wang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1483-9195","authenticated-orcid":true,"given":"Yang","family":"Zhang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7108-3574","authenticated-orcid":true,"given":"Xin","family":"Gao","sequence":"additional","affiliation":[]},{"given":"Eric P.","family":"Xing","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0881-5891","authenticated-orcid":true,"given":"Min","family":"Xu","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2020,11,5]]},"reference":[{"key":"pcbi.1008297.ref001","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1186\/1471-2105-8-43","article-title":"Predictive modeling of plant messenger RNA polyadenylation sites","volume":"8","author":"G Ji","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1008297.ref002","doi-asserted-by":"crossref","first-page":"3578","DOI":"10.1182\/blood.V126.23.3578.3578","article-title":"An Intronic Suppressor Element Regulates RUNX1 Alternative Polyadenylation","volume":"126","author":"A Scholl","year":"2015","journal-title":"Blood"},{"issue":"7","key":"pcbi.1008297.ref003","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/S0968-0004(96)10030-X","article-title":"The biochemistry of polyadenylation","volume":"21","author":"E Wahle","year":"1996","journal-title":"Trends in biochemical sciences"},{"key":"pcbi.1008297.ref004","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1016\/S0959-437X(97)80132-3","article-title":"Life and death in the cytoplasm: messages from the 3\u2019 end","volume":"7","author":"M Wickens","year":"1997","journal-title":"Current Opinion in Genetics & Development"},{"key":"pcbi.1008297.ref005","doi-asserted-by":"crossref","first-page":"2755","DOI":"10.1101\/gad.11.21.2755","article-title":"Mechanism and regulation of mRNA polyadenylation","volume":"11","author":"DF Colgan","year":"1997","journal-title":"Genes & development"},{"key":"pcbi.1008297.ref006","doi-asserted-by":"crossref","first-page":"1001","DOI":"10.1101\/gr.10.7.1001","article-title":"Patterns of Variant Polyadenylation Signal Usage in Human Genes","volume":"10","author":"E Beaudoing","year":"2000","journal-title":"Genome Research"},{"key":"pcbi.1008297.ref007","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0303-7207(02)00044-8","article-title":"Reexamining the polyadenylation signal: were we wrong about AAUAAA?","volume":"190","author":"CC MacDonald","year":"2002","journal-title":"Molecular and Cellular Endocrinology"},{"key":"pcbi.1008297.ref008","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/1471-2164-4-7","article-title":"Sequence determinants in human polyadenylation site selection","volume":"4","author":"M Legendre","year":"2003","journal-title":"BMC Genomics"},{"key":"pcbi.1008297.ref009","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1093\/nar\/gki158","article-title":"A large-scale analysis of mRNA polyadenylation of human and mouse genes","volume":"33","author":"B Tian","year":"2005","journal-title":"Nucleic Acids Research"},{"key":"pcbi.1008297.ref010","doi-asserted-by":"crossref","first-page":"2547","DOI":"10.1093\/nar\/25.13.2547","article-title":"Alternative poly(A) site selection in complex transcription units: means to an end?","volume":"25","author":"G Edwalds-Gilbert","year":"1997","journal-title":"Nucleic Acids Research"},{"key":"pcbi.1008297.ref011","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1007\/s00438-017-1375-4","article-title":"Major splice variants and multiple polyadenylation site utilization in mRNAs encoding human translation initiation factors eIF4E1 and eIF4E3 regulate the translational regulators?","volume":"293","author":"S Mrvov\u00e1","year":"2018","journal-title":"Molecular Genetics and Genomics"},{"key":"pcbi.1008297.ref012","doi-asserted-by":"crossref","first-page":"853","DOI":"10.1016\/j.molcel.2011.08.017","article-title":"Mechanisms and Consequences of Alternative Polyadenylation","volume":"43","author":"DC Di Giammartino","year":"2011","journal-title":"Molecular Cell"},{"key":"pcbi.1008297.ref013","doi-asserted-by":"crossref","first-page":"2105","DOI":"10.1261\/rna.035899.112","article-title":"Alternative polyadenylation: New insights from global analyses","volume":"18","author":"Y Shi","year":"2012","journal-title":"RNA"},{"key":"pcbi.1008297.ref014","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1038\/nrg3482","article-title":"Alternative cleavage and polyadenylation: extent, regulation and function","volume":"14","author":"R Elkon","year":"2013","journal-title":"Nature Reviews Genetics"},{"key":"pcbi.1008297.ref015","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1016\/j.tibs.2013.03.005","article-title":"Alternative cleavage and polyadenylation: the long and short of it","volume":"38","author":"B Tian","year":"2013","journal-title":"Trends in Biochemical Sciences"},{"key":"pcbi.1008297.ref016","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/j.tcb.2015.10.012","article-title":"Evolution and Biological Roles of Alternative 3\u2019UTRs","volume":"26","author":"C Mayr","year":"2016","journal-title":"Trends in Cell Biology"},{"key":"pcbi.1008297.ref017","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1158\/1541-7786.MCR-15-0489","article-title":"Alternative Polyadenylation: Another Foe in Cancer","volume":"14","author":"AE Erson-Bensan","year":"2016","journal-title":"Molecular Cancer Research"},{"key":"pcbi.1008297.ref018","doi-asserted-by":"crossref","first-page":"53","DOI":"10.3389\/fendo.2013.00053","article-title":"Alterations in Polyadenylation and Its Implications for Endocrine Disease","volume":"4","author":"A Rehfeld","year":"2013","journal-title":"Frontiers in Endocrinology"},{"key":"pcbi.1008297.ref019","first-page":"5061","article-title":"Role of p53 mutations in endocrine tumorigenesis: mutation detection by polymerase chain reaction-single strand conformation polymorphism","volume":"52","author":"K Yoshimoto","year":"1992","journal-title":"Cancer research"},{"key":"pcbi.1008297.ref020","doi-asserted-by":"crossref","first-page":"813","DOI":"10.1261\/rna.055681.115","article-title":"Poly(A) code analyses reveal key determinants for tissue-specific mRNA alternative polyadenylation","volume":"22","author":"L Weng","year":"2016","journal-title":"RNA"},{"key":"pcbi.1008297.ref021","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1261\/rna.2581711","article-title":"Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq","volume":"17","author":"PJ Shepard","year":"2011","journal-title":"RNA"},{"key":"pcbi.1008297.ref022","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1038\/nature09616","article-title":"Formation, regulation and evolution of Caenorhabditis elegans 3\u2019UTRs","volume":"469","author":"CH Jan","year":"2011","journal-title":"Nature"},{"key":"pcbi.1008297.ref023","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1101\/gr.115295.110","article-title":"Differential genome-wide profiling of tandem 3\u2019 UTRs among human breast cancer and normal cells by high-throughput sequencing","volume":"21","author":"Y Fu","year":"2011","journal-title":"Genome Research"},{"key":"pcbi.1008297.ref024","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1038\/nmeth.2288","article-title":"Analysis of alternative cleavage and polyadenylation by 3\u2032 region extraction and deep sequencing","volume":"10","author":"M Hoque","year":"2013","journal-title":"Nature Methods"},{"issue":"13","key":"pcbi.1008297.ref025","doi-asserted-by":"crossref","first-page":"i108","DOI":"10.1093\/bioinformatics\/btt233","article-title":"Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation","volume":"29","author":"D Hafez","year":"2013","journal-title":"Bioinformatics"},{"key":"pcbi.1008297.ref026","first-page":"84","article-title":"An in-silico method for prediction of polyadenylation signals in human sequences","volume":"14","author":"H Liu","year":"2003","journal-title":"Genome informatics International Conference on Genome Informatics"},{"key":"pcbi.1008297.ref027","doi-asserted-by":"crossref","first-page":"2320","DOI":"10.1093\/bioinformatics\/btl394","article-title":"Prediction of mRNA polyadenylation sites by support vector machine","volume":"22","author":"Y Cheng","year":"2006","journal-title":"Bioinformatics"},{"key":"pcbi.1008297.ref028","doi-asserted-by":"crossref","first-page":"i316","DOI":"10.1093\/bioinformatics\/btt218","article-title":"Poly(A) motif prediction using spectral latent features from human DNA sequences","volume":"29","author":"B Xie","year":"2013","journal-title":"Bioinformatics"},{"key":"pcbi.1008297.ref029","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1093\/bioinformatics\/btr602","article-title":"Dragon PolyA Spotter: predictor of poly(A) motifs within human genomic DNA sequences","volume":"28","author":"M Kalkatawi","year":"2012","journal-title":"Bioinformatics"},{"key":"pcbi.1008297.ref030","doi-asserted-by":"crossref","first-page":"620","DOI":"10.1186\/s12864-017-4033-7","article-title":"Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA","volume":"18","author":"A Magana-Mora","year":"2017","journal-title":"BMC Genomics"},{"key":"pcbi.1008297.ref031","doi-asserted-by":"crossref","first-page":"24340","DOI":"10.1109\/ACCESS.2018.2825996","article-title":"DeepPolyA: A Convolutional Neural Network Approach for Polyadenylation Site Prediction","volume":"6","author":"X Gao","year":"2018","journal-title":"IEEE Access"},{"key":"pcbi.1008297.ref032","article-title":"DeeReCT-PolyA: a robust and generic deep learning method for PAS identification","author":"Z Xia","year":"2018","journal-title":"Bioinformatics"},{"issue":"7","key":"pcbi.1008297.ref033","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1093\/bioinformatics\/bty752","article-title":"DeepGSR: an optimized deep-learning structure for the recognition of genomic signals and regions","volume":"35","author":"M Kalkatawi","year":"2018","journal-title":"Bioinformatics"},{"key":"pcbi.1008297.ref034","article-title":"SANPolyA: a deep learning method for identifying Poly(A) signals","author":"H Yu","year":"2020","journal-title":"Bioinformatics"},{"issue":"6","key":"pcbi.1008297.ref035","doi-asserted-by":"crossref","first-page":"1173","DOI":"10.1101\/gr.132563.111","article-title":"A quantitative atlas of polyadenylation in five mammals","volume":"22","author":"A Derti","year":"2012","journal-title":"Genome research"},{"key":"pcbi.1008297.ref036","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1093\/bib\/bbu011","article-title":"Genome-wide identification and predictive modeling of polyadenylation sites in eukaryotes","volume":"16","author":"G Ji","year":"2015","journal-title":"Briefings in Bioinformatics"},{"key":"pcbi.1008297.ref037","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1002\/wrna.116","article-title":"Signals for pre-mRNA cleavage and polyadenylation","volume":"3","author":"B Tian","year":"2012","journal-title":"Wiley Interdisciplinary Reviews: RNA"},{"key":"pcbi.1008297.ref038","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1002\/wrna.59","article-title":"Alternative polyadenylation and gene expression regulation in plants","volume":"2","author":"D Xing","year":"2011","journal-title":"Wiley Interdisciplinary Reviews: RNA"},{"key":"pcbi.1008297.ref039","first-page":"1929","article-title":"Dropout: A Simple Way to Prevent Neural Networks from Overfitting","volume":"15","author":"N Srivastava","year":"2014","journal-title":"Journal of Machine Learning Research"},{"key":"pcbi.1008297.ref040","first-page":"2096","article-title":"Domain-adversarial training of neural networks","volume":"17","author":"Y Ganin","year":"2016","journal-title":"The Journal of Machine Learning Research"},{"key":"pcbi.1008297.ref041","doi-asserted-by":"crossref","unstructured":"Haoliang Li SWACK Sinno Jialin Pan. Domain Generalization with Adversarial Feature Learning. 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2018; p. 5400\u20135409.","DOI":"10.1109\/CVPR.2018.00566"},{"key":"pcbi.1008297.ref042","first-page":"54","article-title":"Removing Confounding Factors Associated Weights in Deep Neural Networks Improves the Prediction Accuracy for Healthcare Applications","volume":"24","author":"H Wang","year":"2019","journal-title":"Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing"},{"key":"pcbi.1008297.ref043","doi-asserted-by":"crossref","unstructured":"Carlucci FM, Russo P, Tommasi T, Caputo B. Hallucinating agnostic images to generalize across domains. In: 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE; 2019. p. 3227\u20133234.","DOI":"10.1109\/ICCVW.2019.00403"},{"key":"pcbi.1008297.ref044","unstructured":"Wang H, He Z, Lipton ZL, Xing EP. Learning Robust Representations by Projecting Superficial Statistics Out. In: International Conference on Learning Representations; 2019. Available from: https:\/\/openreview.net\/forum?id=rJEjjoR9K7."},{"key":"pcbi.1008297.ref045","unstructured":"Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014;."},{"key":"pcbi.1008297.ref046","doi-asserted-by":"crossref","first-page":"1427","DOI":"10.1101\/gr.237826.118","article-title":"A compendium of conserved cleavage and polyadenylation events in mammalian genes","volume":"28","author":"R Wang","year":"2018","journal-title":"Genome Research"},{"key":"pcbi.1008297.ref047","doi-asserted-by":"crossref","unstructured":"Barandela R, Valdovinos RM, S\u00e1nchez JS, Ferri FJ. The imbalanced training sample problem: Under or over sampling? In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer; 2004. p. 806\u2013814.","DOI":"10.1007\/978-3-540-27868-9_88"},{"key":"pcbi.1008297.ref048","unstructured":"Hensman P, Masko D. The impact of imbalanced training data for convolutional neural networks. Degree Project in Computer Science, KTH Royal Institute of Technology. 2015;."},{"issue":"6","key":"pcbi.1008297.ref049","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1101\/gr.849004","article-title":"WebLogo: a sequence logo generator","volume":"14","author":"GE Crooks","year":"2004","journal-title":"Genome research"},{"issue":"17","key":"pcbi.1008297.ref050","doi-asserted-by":"crossref","first-page":"1770","DOI":"10.1101\/gad.17268411","article-title":"Ending the message: poly (A) signals then and now","volume":"25","author":"NJ Proudfoot","year":"2011","journal-title":"Genes & development"},{"issue":"16","key":"pcbi.1008297.ref051","doi-asserted-by":"crossref","first-page":"2796","DOI":"10.1093\/bioinformatics\/btz015","article-title":"i6mA-Pred: Identifying DNA N6-methyladenine sites in the rice genome","volume":"35","author":"W Chen","year":"2019","journal-title":"Bioinformatics"},{"issue":"23","key":"pcbi.1008297.ref052","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"L Fu","year":"2012","journal-title":"Bioinformatics"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1008297","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,11,17]],"date-time":"2020-11-17T00:00:00Z","timestamp":1605571200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008297","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,17]],"date-time":"2020-11-17T13:45:42Z","timestamp":1605620742000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008297"}},"subtitle":[],"editor":[{"given":"Zhaolei","family":"Zhang","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,11,5]]},"references-count":52,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,11,5]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008297","relation":{},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,5]]}}}