{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T15:04:50Z","timestamp":1771340690680,"version":"3.50.1"},"reference-count":57,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2021,4,9]],"date-time":"2021-04-09T00:00:00Z","timestamp":1617926400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 AI111965"],"award-info":[{"award-number":["R01 AI111965"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","award":["DP120104460"],"award-info":[{"award-number":["DP120104460"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","award":["LP110200333"],"award-info":[{"award-number":["LP110200333"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000925","name":"National Health and Medical Research Council","doi-asserted-by":"publisher","award":["1092262"],"award-info":[{"award-number":["1092262"]}],"id":[{"id":"10.13039\/501100000925","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["30918011104"],"award-info":[{"award-number":["30918011104"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007219","name":"Natural Science Foundation of Shanghai","doi-asserted-by":"publisher","award":["BK2020021304"],"award-info":[{"award-number":["BK2020021304"]}],"id":[{"id":"10.13039\/100007219","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61772273"],"award-info":[{"award-number":["61772273"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072243"],"award-info":[{"award-number":["62072243"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,9,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Knowledge of the specificity of DNA-protein binding is crucial for understanding the mechanisms of gene expression, regulation and gene therapy. In recent years, deep-learning-based methods for predicting DNA-protein binding from sequence data have achieved significant success. Nevertheless, the current state-of-the-art computational methods have some drawbacks associated with the use of limited datasets with insufficient experimental data. To address this, we propose a novel transfer learning-based method, termed SAResNet, which combines the self-attention mechanism and residual network structure. More specifically, the attention-driven module captures the position information of the sequence, while the residual network structure guarantees that the high-level features of the binding site can be extracted. Meanwhile, the pre-training strategy used by SAResNet improves the learning ability of the network and accelerates the convergence speed of the network during transfer learning. The performance of SAResNet is extensively tested on 690 datasets from the ChIP-seq experiments with an average AUC of 92.0%, which is 4.4% higher than that of the best state-of-the-art method currently available. When tested on smaller datasets, the predictive performance is more clearly improved. Overall, we demonstrate that the superior performance of DNA-protein binding prediction on DNA sequences can be achieved by combining the attention mechanism and residual structure, and a novel pipeline is accordingly developed. The proposed methodology is generally applicable and can be used to address any other sequence classification problems.<\/jats:p>","DOI":"10.1093\/bib\/bbab101","type":"journal-article","created":{"date-parts":[[2021,3,9]],"date-time":"2021-03-09T12:26:56Z","timestamp":1615292816000},"source":"Crossref","is-referenced-by-count":44,"title":["SAResNet: self-attention residual network for predicting DNA-protein binding"],"prefix":"10.1093","volume":"22","author":[{"given":"Long-Chen","family":"Shen","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology, China"}]},{"given":"Yan","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology, China"}]},{"given":"Jiangning","family":"Song","sequence":"additional","affiliation":[{"name":"Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia"}]},{"given":"Dong-Jun","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Nanjing University of Science and Technology, China"}]}],"member":"286","published-online":{"date-parts":[[2021,4,9]]},"reference":[{"key":"2021111009314393600_ref1","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/j.cell.2012.12.009","article-title":"DNA-binding specificities of human transcription factors","volume":"152","author":"Jolma","year":"2013","journal-title":"Cell"},{"key":"2021111009314393600_ref2","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1038\/ng.406","article-title":"The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling","volume":"41","author":"Tuupanen","year":"2009","journal-title":"Nat Genet"},{"key":"2021111009314393600_ref3","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1038\/nrg1315","article-title":"Applied bioinformatics for the identification of regulatory elements","volume":"5","author":"Wasserman","year":"2004","journal-title":"Nat Rev Genet"},{"key":"2021111009314393600_ref4","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1038\/nbt1053","article-title":"Assessing computational tools for the discovery of transcription factor binding sites","volume":"23","author":"Tompa","year":"2005","journal-title":"Nat Biotechnol"},{"key":"2021111009314393600_ref5","doi-asserted-by":"crossref","first-page":"1555","DOI":"10.1093\/bioinformatics\/btw024","article-title":"TFBSTools: an R\/bioconductor package for transcription factor binding site analysis","volume":"32","author":"Tan","year":"2016","journal-title":"Bioinformatics"},{"key":"2021111009314393600_ref6","doi-asserted-by":"crossref","first-page":"1907","DOI":"10.1101\/gr.133306.111","article-title":"Transcription factor redundancy and tissue-specific regulation: evidence from functional and physical network connectivity","volume":"22","author":"Kuntz","year":"2012","journal-title":"Genome Res"},{"key":"2021111009314393600_ref7","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1186\/1471-2105-8-463","article-title":"Identification of DNA-binding proteins using support vector machines and evolutionary profiles","volume":"8","author":"Kumar","year":"2007","journal-title":"BMC Bioinform"},{"key":"2021111009314393600_ref8","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1016\/j.cell.2018.01.029","article-title":"The human transcription factors","volume":"172","author":"Lambert","year":"2018","journal-title":"Cell"},{"key":"2021111009314393600_ref9","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1016\/j.csbj.2018.10.007","article-title":"iGHBP: computational identification of growth hormone binding proteins from sequences using extremely randomised tree","volume":"16","author":"Basith","year":"2018","journal-title":"Comput Struct Biotechnol J"},{"key":"2021111009314393600_ref10","doi-asserted-by":"crossref","first-page":"840","DOI":"10.1038\/nrg3306","article-title":"ChIP\u2013seq and beyond: new and improved methodologies to detect and characterize protein\u2013DNA interactions","volume":"13","author":"Furey","year":"2012","journal-title":"Nat Rev Genet"},{"key":"2021111009314393600_ref11","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1038\/nmeth.4143","article-title":"SMiLE-seq identifies binding motifs of single and dimeric transcription factors","volume":"14","author":"Isakova","year":"2017","journal-title":"Nat Methods"},{"key":"2021111009314393600_ref12","first-page":"65","article-title":"Computational approaches for predicting the binding sites and understanding the recognition mechanism of protein\u2013DNA complexes. Advances in protein chemistry and structural biology","volume":"91","author":"Gromiha","year":"2013","journal-title":"Elsevier"},{"key":"2021111009314393600_ref13","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1016\/j.ygeno.2018.01.005","article-title":"iDNA6mA-PseKNC: identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC","volume":"111","author":"Feng","year":"2019","journal-title":"Genomics"},{"key":"2021111009314393600_ref14","doi-asserted-by":"crossref","first-page":"1944","DOI":"10.18632\/oncotarget.23099","article-title":"DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest","volume":"9","author":"Manavalan","year":"2018","journal-title":"Oncotarget"},{"key":"2021111009314393600_ref15","doi-asserted-by":"crossref","first-page":"e153","DOI":"10.1093\/nar\/gkt574","article-title":"DNA motif elucidation using belief propagation","volume":"41","author":"Wong","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2021111009314393600_ref16","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1003711","article-title":"Enhanced regulatory sequence prediction using gapped k-mer features","volume":"10","author":"Ghandi","year":"2014","journal-title":"PLoS Comput Biol"},{"key":"2021111009314393600_ref17","first-page":"1137","volume-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence","author":"Ren","year":"2017"},{"key":"2021111009314393600_ref18","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2999182","article-title":"Coarse-to-fine cnn for image super-resolution","author":"Tian","year":"2020","journal-title":"IEEE Transactions on Multimedia"},{"key":"2021111009314393600_ref19","first-page":"3431","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Long","year":"2015"},{"key":"2021111009314393600_ref20","doi-asserted-by":"crossref","first-page":"1437","DOI":"10.1093\/bib\/bbz081","article-title":"Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning","volume":"21","author":"Hong","year":"2020","journal-title":"Brief Bioinform"},{"key":"2021111009314393600_ref21","doi-asserted-by":"crossref","first-page":"1825","DOI":"10.1093\/bib\/bbz120","article-title":"Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery","volume":"21","author":"Hong","year":"2020","journal-title":"Brief Bioinform"},{"key":"2021111009314393600_ref22","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2021111009314393600_ref23","doi-asserted-by":"crossref","first-page":"i121","DOI":"10.1093\/bioinformatics\/btw255","article-title":"Convolutional neural network architectures for predicting DNA\u2013protein binding","volume":"32","author":"Zeng","year":"2016","journal-title":"Bioinformatics"},{"key":"2021111009314393600_ref24","doi-asserted-by":"crossref","first-page":"1405","DOI":"10.1093\/bioinformatics\/btz768","article-title":"Expectation pooling: an effective and interpretable pooling method for predicting DNA\u2013protein binding","volume":"36","author":"Luo","year":"2020","journal-title":"Bioinformatics"},{"key":"2021111009314393600_ref25","volume-title":"The EM algorithm and extensions","author":"McLachlan","year":"2007"},{"key":"2021111009314393600_ref26","doi-asserted-by":"crossref","first-page":"15270","DOI":"10.1038\/s41598-018-33321-1","article-title":"Recurrent neural network for predicting transcription factor binding sites","volume":"8","author":"Shen","year":"2018","journal-title":"Sci Rep"},{"key":"2021111009314393600_ref27","doi-asserted-by":"crossref","first-page":"1903","DOI":"10.1145\/3097983.3098088","volume-title":"Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining","author":"Ma","year":"2017"},{"key":"2021111009314393600_ref28","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1007\/s13042-019-00990-x","article-title":"DeepSite: bidirectional LSTM and CNN models for predicting DNA\u2013protein binding","volume":"11","author":"Zhang","year":"2020","journal-title":"Int J Mach Learn Cybern"},{"key":"2021111009314393600_ref29","first-page":"126","volume-title":"International Conference on Intelligent Science and Big Data Engineering","author":"Bao","year":"2019"},{"key":"2021111009314393600_ref30","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"Consortium","year":"2012","journal-title":"Nature"},{"key":"2021111009314393600_ref31","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning\u2013based sequence model","volume":"12","author":"Zhou","year":"2015","journal-title":"Nat Methods"},{"key":"2021111009314393600_ref32","doi-asserted-by":"crossref","first-page":"W39","DOI":"10.1093\/nar\/gkv416","article-title":"The MEME suite","volume":"43","author":"Bailey","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2021111009314393600_ref33","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/TNB.2016.2555951","article-title":"Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning","volume":"15","author":"Liu","year":"2016","journal-title":"IEEE Trans Nanobioscience"},{"key":"2021111009314393600_ref34","doi-asserted-by":"crossref","first-page":"476","DOI":"10.3389\/fmicb.2018.00476","article-title":"PVP-SVM: sequence-based prediction of phage virion proteins using a support vector machine","volume":"9","author":"Manavalan","year":"2018","journal-title":"Front Microbiol"},{"key":"2021111009314393600_ref35","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1016\/j.omtn.2019.04.019","article-title":"Meta-4mCpred: a sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation","volume":"16","author":"Manavalan","year":"2019","journal-title":"Molecular Therapy-Nucleic Acids"},{"key":"2021111009314393600_ref36","doi-asserted-by":"crossref","first-page":"7606","DOI":"10.1093\/nar\/gkt544","article-title":"Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins","volume":"41","author":"Nagarajan","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2021111009314393600_ref37","doi-asserted-by":"crossref","first-page":"i269","DOI":"10.1093\/bioinformatics\/btz339","article-title":"Comprehensive evaluation of deep learning architectures for prediction of DNA\/RNA sequence binding specificities","volume":"35","author":"Trabelsi","year":"2019","journal-title":"Bioinformatics"},{"key":"2021111009314393600_ref38","first-page":"3156","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Wang","year":"2017"},{"key":"2021111009314393600_ref39","first-page":"5446","volume-title":"Thirty-Second AAAI Conference on Artificial Intelligence","author":"Shen","year":"2018"},{"key":"2021111009314393600_ref40","first-page":"7794","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Wang","year":"2018"},{"key":"2021111009314393600_ref41","first-page":"448","article-title":"Batch normalization: accelerating deep network training by reducing internal covariate shift","author":"Ioffe","year":"2015","journal-title":"International conference on machine learning"},{"key":"2021111009314393600_ref42","first-page":"507","volume-title":"Proceedings of The 33rd International Conference on Machine Learning","author":"Liu","year":"2016"},{"key":"2021111009314393600_ref43","first-page":"770","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"He","year":"2016"},{"key":"2021111009314393600_ref44","first-page":"630","volume-title":"European conference on computer vision","author":"He","year":"2016"},{"key":"2021111009314393600_ref45","article-title":"Empirical evaluation of rectified activations in convolutional network","author":"Xu","journal-title":"arXiv"},{"key":"2021111009314393600_ref46","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J Mach Learn Res"},{"key":"2021111009314393600_ref47","article-title":"Tensorflow: large-scale machine learning on heterogeneous distributed systems","author":"Abadi","year":"2016","journal-title":"arXiv"},{"key":"2021111009314393600_ref48","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2017","journal-title":"arXiv"},{"key":"2021111009314393600_ref49","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"10","author":"Pan","year":"2010","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2021111009314393600_ref50","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan","year":"2015","journal-title":"arXiv"},{"key":"2021111009314393600_ref51","doi-asserted-by":"crossref","first-page":"1184","DOI":"10.1109\/TCBB.2018.2819660","article-title":"High-order convolutional neural network architecture for predicting DNA-protein binding sites","volume":"16","author":"Zhang","year":"2018","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2021111009314393600_ref52","doi-asserted-by":"crossref","first-page":"2205","DOI":"10.1093\/bioinformatics\/btw203","article-title":"gkmSVM: an R package for gapped-kmer SVM","volume":"32","author":"Ghandi","year":"2016","journal-title":"Bioinformatics"},{"key":"2021111009314393600_ref53","doi-asserted-by":"crossref","first-page":"1332","DOI":"10.3390\/cells8111332","article-title":"4mCpred-EL: an ensemble learning framework for identification of DNA N4-methylcytosine sites in the mouse genome","volume":"8","author":"Manavalan","year":"2019","journal-title":"Cell"},{"key":"2021111009314393600_ref54","doi-asserted-by":"crossref","first-page":"2796","DOI":"10.1093\/bioinformatics\/btz015","article-title":"i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome","volume":"35","author":"Chen","year":"2019","journal-title":"Bioinformatics"},{"key":"2021111009314393600_ref55","first-page":"1","article-title":"Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation","author":"Xu","year":"2015","journal-title":"BMC Syst Biol"},{"key":"2021111009314393600_ref56","article-title":"iRNA-PseU: identifying RNA pseudouridine sites","volume":"5","author":"Chen","year":"2016","journal-title":"Molecular Therapy-Nucleic Acids"},{"key":"2021111009314393600_ref57","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1002\/prot.21677","article-title":"Prediction of RNA binding sites in a protein using SVM and PSSM profile, proteins: structure","volume":"71","author":"Kumar","year":"2008","journal-title":"Function and Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/5\/bbab101\/41119929\/bbab101.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/5\/bbab101\/41119929\/bbab101.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,22]],"date-time":"2023-10-22T19:27:40Z","timestamp":1698002860000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab101\/6218493"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,9]]},"references-count":57,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,9,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab101","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,9]]},"published":{"date-parts":[[2021,4,9]]},"article-number":"bbab101"}}