{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T05:53:22Z","timestamp":1781157202549,"version":"3.54.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2023,6,16]],"date-time":"2023-06-16T00:00:00Z","timestamp":1686873600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"publisher","award":["ZR2021MF098"],"award-info":[{"award-number":["ZR2021MF098"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62172248"],"award-info":[{"award-number":["62172248"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,7,20]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Precise targeting of transcription factor binding sites (TFBSs) is essential to comprehending transcriptional regulatory processes and investigating cellular function. Although several deep learning algorithms have been created to predict TFBSs, the models\u2019 intrinsic mechanisms and prediction results are difficult to explain. There is still room for improvement in prediction performance. We present DeepSTF, a unique deep-learning architecture for predicting TFBSs by integrating DNA sequence and shape profiles. We use the improved transformer encoder structure for the first time in the TFBSs prediction approach. DeepSTF extracts DNA higher-order sequence features using stacked convolutional neural networks\u00a0(CNNs), whereas rich DNA shape profiles are extracted by combining improved transformer encoder structure and bidirectional long short-term memory\u00a0(Bi-LSTM), and, finally, the derived higher-order sequence features and representative shape profiles are integrated into the channel dimension to achieve accurate TFBSs prediction. Experiments on 165 ENCODE chromatin immunoprecipitation sequencing (ChIP-seq) datasets show that DeepSTF considerably outperforms several state-of-the-art algorithms in predicting TFBSs, and we explain the usefulness of the transformer encoder structure and the combined strategy using sequence features and shape profiles in capturing multiple dependencies and learning essential features. In addition, this paper examines the significance of DNA shape features predicting TFBSs. The source code of DeepSTF is available at https:\/\/github.com\/YuBinLab-QUST\/DeepSTF\/.<\/jats:p>","DOI":"10.1093\/bib\/bbad231","type":"journal-article","created":{"date-parts":[[2023,6,17]],"date-time":"2023-06-17T03:06:51Z","timestamp":1686971211000},"source":"Crossref","is-referenced-by-count":37,"title":["DeepSTF: predicting transcription factor binding sites by interpretable deep neural networks combining sequence and shape"],"prefix":"10.1093","volume":"24","author":[{"given":"Pengju","family":"Ding","sequence":"first","affiliation":[{"name":"Qingdao University of Science and Technology , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yifei","family":"Wang","sequence":"additional","affiliation":[{"name":"Qingdao University of Science and Technology , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xinyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Qingdao University of Science and Technology , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[{"name":"King Abdullah University of Science and Technology (KAUST) , Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guozhu","family":"Liu","sequence":"additional","affiliation":[{"name":"Qingdao University of Science and Technology , China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2453-7852","authenticated-orcid":false,"given":"Bin","family":"Yu","sequence":"additional","affiliation":[{"name":"Qingdao University of Science and Technology , China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2023,6,16]]},"reference":[{"issue":"6","key":"2023072020155501300_ref1","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1109\/TST.2014.6961027","article-title":"Structure-based prediction of transcription factor binding sites","volume":"19","author":"Guo","year":"2014","journal-title":"Tsinghua Sci Technol"},{"issue":"7414","key":"2023072020155501300_ref2","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"Dunham","year":"2012","journal-title":"Nature"},{"key":"2023072020155501300_ref3","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.shpsc.2018.10.008","article-title":"ENCODE and the parts of the human genome","volume":"72","author":"Kaiser","year":"2018","journal-title":"Stud Hist Philos Biol Biomed Sci"},{"issue":"9","key":"2023072020155501300_ref4","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1016\/j.tibs.2014.07.002","article-title":"Absence of a simple code: how transcription factors read the genome","volume":"39","author":"Slattery","year":"2014","journal-title":"Trends Biochem Sci"},{"issue":"11\u201312","key":"2023072020155501300_ref5","doi-asserted-by":"crossref","first-page":"194765","DOI":"10.1016\/j.bbagrm.2021.194765","article-title":"A GO catalogue of human DNA-binding transcription factors","volume":"1864","author":"Lovering","year":"2021","journal-title":"Biochim Biophys Acta Gene Regul Mech"},{"issue":"4","key":"2023072020155501300_ref6","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1016\/j.cell.2018.01.029","article-title":"The human transcription factors","volume":"172","author":"Lambert","year":"2018","journal-title":"Cell"},{"issue":"6","key":"2023072020155501300_ref7","doi-asserted-by":"crossref","first-page":"1408","DOI":"10.1016\/j.cell.2011.11.013","article-title":"Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution","volume":"147","author":"Rhee","year":"2011","journal-title":"Cell"},{"key":"2023072020155501300_ref8","doi-asserted-by":"crossref","first-page":"S7","DOI":"10.1186\/1471-2105-12-S1-S7","article-title":"Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery","volume":"12","author":"Han","year":"2011","journal-title":"BMC Bioinform"},{"issue":"6","key":"2023072020155501300_ref9","doi-asserted-by":"crossref","first-page":"1592","DOI":"10.1109\/TCBB.2011.79","article-title":"Molecular pattern discovery based on penalized matrix decomposition","volume":"8","author":"Zheng","year":"2011","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2023072020155501300_ref10","doi-asserted-by":"crossref","first-page":"1642","DOI":"10.1109\/ACCESS.2016.2552478","article-title":"Feature selection for hidden Markov models and hidden semi-Markov models","volume":"4","author":"Adams","year":"2016","journal-title":"IEEE Access"},{"issue":"12","key":"2023072020155501300_ref11","doi-asserted-by":"crossref","first-page":"1580","DOI":"10.1016\/j.patrec.2012.04.003","article-title":"Dynamic random forests","volume":"33","author":"Bernard","year":"2012","journal-title":"Pattern Recogn Lett"},{"issue":"1","key":"2023072020155501300_ref12","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1186\/s12859-022-04734-7","article-title":"Modeling binding specificities of transcription factor pairs with random forests","volume":"23","author":"Antikainen","year":"2022","journal-title":"BMC Bioinform"},{"issue":"W1","key":"2023072020155501300_ref13","doi-asserted-by":"crossref","first-page":"W544","DOI":"10.1093\/nar\/gkt519","article-title":"Kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets","volume":"41","author":"Fletez-Brant","year":"2013","journal-title":"Nucleic Acids Res"},{"issue":"S7","key":"2023072020155501300_ref14","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1186\/s12859-019-2735-3","article-title":"MD-SVM: a novel SVM-based algorithm for the motif discovery of transcription factor binding sites","volume":"20","author":"Hu","year":"2019","journal-title":"BMC Bioinform"},{"issue":"5","key":"2023072020155501300_ref15","doi-asserted-by":"crossref","first-page":"bbab101","DOI":"10.1093\/bib\/bbab101","article-title":"SAResNet: self-attention residual network for predicting DNA-protein binding","volume":"22","author":"Shen","year":"2021","journal-title":"Brief Bioinform"},{"issue":"1","key":"2023072020155501300_ref16","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1186\/s12859-020-03952-1","article-title":"DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks","volume":"22","author":"Chen","year":"2021","journal-title":"BMC Bioinform"},{"key":"2023072020155501300_ref17","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbad036","article-title":"Cooperation of local features and global representations by a dual-branch network for transcription factor binding sites prediction","volume":"24","author":"Yu","year":"2023","journal-title":"Brief Bioinform"},{"key":"2023072020155501300_ref18","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1016\/j.neucom.2019.01.078","article-title":"Bidirectional LSTM with attention mechanism and convolutional layer for text classification","volume":"337","author":"Liu","year":"2019","journal-title":"Neurocomputing"},{"issue":"2","key":"2023072020155501300_ref19","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1016\/j.cell.2015.02.008","article-title":"Deconvolving the recognition of DNA shape from sequence","volume":"161","author":"Abe","year":"2015","journal-title":"Cell"},{"issue":"2","key":"2023072020155501300_ref20","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1109\/TCBB.2019.2947461","article-title":"Predicting in-vitro transcription factor binding sites using DNA sequence plus shape","volume":"18","author":"Zhang","year":"2021","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2023072020155501300_ref21","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1016\/j.omtn.2021.02.014","article-title":"Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture","volume":"24","author":"Wang","year":"2021","journal-title":"Mol Ther Nucleic Acids"},{"issue":"1","key":"2023072020155501300_ref22","doi-asserted-by":"crossref","first-page":"bbab525","DOI":"10.1093\/bib\/bbab525","article-title":"A novel convolution attention model for predicting transcription factor binding sites by combination of sequence and shape","volume":"23","author":"Zhang","year":"2022","journal-title":"Brief Bioinform"},{"issue":"8","key":"2023072020155501300_ref23","doi-asserted-by":"crossref","first-page":"5789","DOI":"10.1007\/s10462-021-09958-2","article-title":"Transformer models for text-based emotion detection: a review of BERT-based approaches","volume":"54","author":"Acheampong","year":"2021","journal-title":"Artif Intell Rev"},{"key":"2023072020155501300_ref24","doi-asserted-by":"crossref","first-page":"570","DOI":"10.1162\/tacl_a_00385","article-title":"Decoupling the role of data, attention, and losses in multimodal transformers","volume":"9","author":"Hendricks","year":"2021","journal-title":"Trans Assoc Comput Linguistics"},{"issue":"10S","key":"2023072020155501300_ref25","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1145\/3505244","article-title":"Transformers in vision: a survey","volume":"54","author":"Khan","year":"2022","journal-title":"ACM Comput Surv"},{"issue":"21","key":"2023072020155501300_ref26","doi-asserted-by":"crossref","first-page":"14034","DOI":"10.3390\/su142114034","article-title":"A novel multimodal species distribution model fusing remote sensing images and environmental features","volume":"14","author":"Zhang","year":"2022","journal-title":"Sustainability"},{"key":"2023072020155501300_ref27","doi-asserted-by":"crossref","first-page":"101304","DOI":"10.1016\/j.csl.2021.101304","article-title":"MS-transformer: introduce multiple structural priors into a unified transformer for encoding sentences","volume":"72","author":"Qi","year":"2022","journal-title":"Comput Speech Lang"},{"issue":"5","key":"2023072020155501300_ref28","doi-asserted-by":"crossref","first-page":"870","DOI":"10.1049\/cje.2021.00.221","article-title":"AttentionSplice: an interpretable multi-head self-attention based hybrid deep learning model in splice site prediction","volume":"31","author":"Yan","year":"2022","journal-title":"Chin J Electr"},{"key":"2023072020155501300_ref29","doi-asserted-by":"crossref","first-page":"i121","DOI":"10.1093\/bioinformatics\/btw255","article-title":"Convolutional neural network architectures for predicting DNA-protein binding","volume":"32","author":"Zeng","year":"2016","journal-title":"Bioinformatics"},{"issue":"8","key":"2023072020155501300_ref30","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat Biotechnol"},{"issue":"W1","key":"2023072020155501300_ref31","doi-asserted-by":"crossref","first-page":"W39","DOI":"10.1093\/nar\/gkv416","article-title":"The MEME suite","volume":"43","author":"Bailey","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023072020155501300_ref32","doi-asserted-by":"crossref","first-page":"91855","DOI":"10.1109\/ACCESS.2022.3197662","article-title":"Bangla-BERT: transformer-based efficient model for transfer learning and language understanding","volume":"10","author":"Kowsher","year":"2022","journal-title":"IEEE Access"},{"key":"2023072020155501300_ref33","doi-asserted-by":"crossref","first-page":"186657","DOI":"10.1109\/ACCESS.2019.2961375","article-title":"Improving the efficiency of encoder-decoder architecture for pixel-level crack detection","volume":"7","author":"Chen","year":"2019","journal-title":"IEEE Access"},{"issue":"12","key":"2023072020155501300_ref34","doi-asserted-by":"crossref","first-page":"1379","DOI":"10.3390\/e22121379","article-title":"Adam and the ants: on the influence of the optimization algorithm on the detectability of DNN watermarks","volume":"22","author":"Cortinas-Lorenzo","year":"2020","journal-title":"Entropy"},{"issue":"11","key":"2023072020155501300_ref35","doi-asserted-by":"crossref","first-page":"e107","DOI":"10.1093\/nar\/gkw226","article-title":"DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences","volume":"44","author":"Quang","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023072020155501300_ref36","doi-asserted-by":"crossref","first-page":"1038","DOI":"10.1038\/s41467-017-01188-x","article-title":"Genome-wide prediction of DNase I hypersensitivity using gene expression","volume":"8","author":"Zhou","year":"2017","journal-title":"Nat Commun"},{"issue":"23","key":"2023072020155501300_ref37","doi-asserted-by":"crossref","first-page":"1465","DOI":"10.2217\/epi-2022-0437","article-title":"The landscape of histone modifications in epigenomics since 2020","volume":"14","author":"Shirvaliloo","year":"2022","journal-title":"Epigenomics"},{"issue":"5","key":"2023072020155501300_ref38","doi-asserted-by":"crossref","first-page":"e202201768","DOI":"10.26508\/lsa.202201768","article-title":"Disruption of polyhomeotic polymerization decreases nucleosome occupancy and alters genome accessibility","volume":"6","author":"Amin","year":"2023","journal-title":"Life Sci Alliance"},{"issue":"17","key":"2023072020155501300_ref39","doi-asserted-by":"crossref","first-page":"4070","DOI":"10.1093\/bioinformatics\/btac489","article-title":"Identifying modifications on DNA-bound histones with joint deep learning of multiple binding sites in DNA sequence","volume":"38","author":"Li","year":"2022","journal-title":"Bioinformatics"},{"key":"2023072020155501300_ref40","doi-asserted-by":"crossref","first-page":"574196","DOI":"10.3389\/fgene.2021.574196","article-title":"Identification of the functions and prognostic values of RNA binding proteins in bladder cancer","volume":"12","author":"Wu","year":"2021","journal-title":"Front Genet"},{"issue":"04","key":"2023072020155501300_ref41","doi-asserted-by":"crossref","first-page":"2250006","DOI":"10.1142\/S0219720022500068","article-title":"DeepBtoD: improved RNA-binding proteins prediction via integrated deep learning","volume":"20","author":"Du","year":"2022","journal-title":"J Bioinform Comput Biol"},{"issue":"2","key":"2023072020155501300_ref42","doi-asserted-by":"crossref","first-page":"bbab564","DOI":"10.1093\/bib\/bbab564","article-title":"AlphaFold2-aware protein-DNA binding site prediction using graph transformer","volume":"23","author":"Yuan","year":"2022","journal-title":"Brief Bioinform"},{"issue":"3\u20134","key":"2023072020155501300_ref43","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1101\/gad.348993.121","article-title":"Distinct structural bases for sequence-specific DNA binding by mammalian BEN domain proteins","volume":"36","author":"Zheng","year":"2022","journal-title":"Genes Dev"},{"issue":"1","key":"2023072020155501300_ref44","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbac486","article-title":"RLBind: a deep learning method to predict RNA-ligand binding sites","volume":"24","author":"Wang","year":"2022","journal-title":"Brief Bioinform"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/4\/bbad231\/50917196\/bbad231.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/4\/bbad231\/50917196\/bbad231.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,20]],"date-time":"2023-07-20T20:18:03Z","timestamp":1689884283000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad231\/7199560"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,16]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,7,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad231","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,7]]},"published":{"date-parts":[[2023,6,16]]},"article-number":"bbad231"}}