{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T17:18:31Z","timestamp":1779211111880,"version":"3.51.4"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T00:00:00Z","timestamp":1695340800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFF1203100"],"award-info":[{"award-number":["2022YFF1203100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["12126610"],"award-info":[{"award-number":["12126610"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,9,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacking known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA- and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common network to acquire common binding characteristics. Finally, these characteristics were input into two fully connected layers to predict binding sites with DNA and RNA, respectively. Through comprehensive tests on DNA\/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for inferring nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, and trained models are available at https:\/\/github.com\/biomed-AI\/nucleic-acid-binding.<\/jats:p>","DOI":"10.1093\/bib\/bbad360","type":"journal-article","created":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T18:41:31Z","timestamp":1697136091000},"source":"Crossref","is-referenced-by-count":29,"title":["Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures"],"prefix":"10.1093","volume":"24","author":[{"given":"Yidong","family":"Song","sequence":"first","affiliation":[{"name":"Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University , Guangzhou 510000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qianmu","family":"Yuan","sequence":"additional","affiliation":[{"name":"Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University , Guangzhou 510000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huiying","family":"Zhao","sequence":"additional","affiliation":[{"name":"Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University , Guangzhou 510000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuedong","family":"Yang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University , Guangzhou 510000 , China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,10,11]]},"reference":[{"key":"2023101218410555700_ref1","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1146\/annurev.bi.53.070184.002133","article-title":"Protein-nucleic acid interactions in transcription: a molecular analysis","volume":"53","author":"Hippel","year":"1984","journal-title":"Annu Rev Biochem"},{"key":"2023101218410555700_ref2","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","article-title":"CATH\u2013a hierarchic classification of protein domain structures","volume":"5","author":"Orengo","year":"1997","journal-title":"Structure"},{"key":"2023101218410555700_ref3","doi-asserted-by":"crossref","first-page":"2306","DOI":"10.1093\/nar\/26.10.2306","article-title":"Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites","volume":"26","author":"Mandel-Gutfreund","year":"1998","journal-title":"Nucleic Acids Res"},{"key":"2023101218410555700_ref4","doi-asserted-by":"crossref","first-page":"1389","DOI":"10.1109\/TCBB.2016.2616469","article-title":"Predicting protein-DNA binding residues by weightedly combining sequence-based features and boosting multiple SVMs","volume":"14","author":"Hu","year":"2016","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2023101218410555700_ref5","doi-asserted-by":"crossref","first-page":"gkx059","DOI":"10.1093\/nar\/gkx059","article-title":"DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues","volume":"45","author":"Yan","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023101218410555700_ref6","doi-asserted-by":"crossref","first-page":"bbaa397","DOI":"10.1093\/bib\/bbaa397","article-title":"NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning","volume":"22","author":"Zhang","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023101218410555700_ref7","doi-asserted-by":"crossref","first-page":"3057","DOI":"10.1021\/acs.jcim.8b00749","article-title":"DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines","volume":"59","author":"Zhu","year":"2019","journal-title":"J Chem Inf Model"},{"key":"2023101218410555700_ref8","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1007\/978-1-4939-0366-5_9","article-title":"SPOT-Seq-RNA: predicting protein\u2013RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction","volume":"1137","author":"Yang","year":"2014","journal-title":"Methods Mol Biol"},{"key":"2023101218410555700_ref9","doi-asserted-by":"crossref","first-page":"e15","DOI":"10.1093\/nar\/gkt1299","article-title":"Identifying RNA-binding residues based on evolutionary conserved structural and energetic features","volume":"42","author":"Chen","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023101218410555700_ref10","doi-asserted-by":"crossref","first-page":"3036","DOI":"10.1093\/bioinformatics\/btx350","article-title":"DeepSite: protein-binding site predictor using 3D-convolutional neural networks","volume":"33","author":"Jim\u00e9nez","year":"2017","journal-title":"Bioinformatics"},{"key":"2023101218410555700_ref11","doi-asserted-by":"crossref","first-page":"e51","DOI":"10.1093\/nar\/gkab044","article-title":"GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues","volume":"49","author":"Xia","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023101218410555700_ref12","doi-asserted-by":"crossref","first-page":"1885","DOI":"10.1002\/prot.24330","article-title":"DNABind: a hybrid algorithm for structure-based prediction of DNA-binding residues by combining machine learning- and template-based approaches","volume":"81","author":"Liu","year":"2013","journal-title":"Proteins"},{"key":"2023101218410555700_ref13","doi-asserted-by":"crossref","first-page":"bbab564","DOI":"10.1093\/bib\/bbab564","article-title":"AlphaFold2-aware protein\u2013DNA binding site prediction using graph transformer","volume":"23","author":"Yuan","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023101218410555700_ref14","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2023101218410555700_ref15","doi-asserted-by":"crossref","first-page":"R245","DOI":"10.1016\/S1074-5521(98)90108-9","article-title":"Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products","volume":"5","author":"Handelsman","year":"1998","journal-title":"Chem Biol"},{"key":"2023101218410555700_ref16","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"Prottrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2023101218410555700_ref17","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci"},{"key":"2023101218410555700_ref18","doi-asserted-by":"crossref","first-page":"bbad117","DOI":"10.1093\/bib\/bbad117","article-title":"Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion","volume":"24","author":"Yuan","year":"2023","journal-title":"Brief Bioinform"},{"key":"2023101218410555700_ref19","doi-asserted-by":"crossref","first-page":"bbac444","DOI":"10.1093\/bib\/bbac444","article-title":"Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning","volume":"23","author":"Yuan","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023101218410555700_ref20","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1093\/bioinformatics\/btab643","article-title":"Structure-aware protein\u2013protein interaction site prediction using deep graph convolutional network","volume":"38","author":"Yuan","year":"2022","journal-title":"Bioinformatics"},{"key":"2023101218410555700_ref21","first-page":"1","article-title":"Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map","volume":"13","author":"Chen","year":"2021","journal-title":"J Chem"},{"key":"2023101218410555700_ref22","doi-asserted-by":"crossref","first-page":"4941","DOI":"10.1038\/s41467-019-12920-0","article-title":"A deep learning framework to predict binding preference of RNA constituents on protein surface","volume":"10","author":"Lam","year":"2019","journal-title":"Nat Commun"},{"key":"2023101218410555700_ref23","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1038\/s42256-020-0152-y","article-title":"Predicting drug\u2013protein interaction using quasi-visual question answering system","volume":"2","author":"Zheng","year":"2020","journal-title":"Nature Machine Intelligence"},{"key":"2023101218410555700_ref24","doi-asserted-by":"crossref","first-page":"3814","DOI":"10.1021\/acs.jcim.1c00475","article-title":"Protein\u2013peptide binding site detection using 3D convolutional neural networks","volume":"61","author":"Kozlovskii","year":"2021","journal-title":"J Chem Inf Model"},{"key":"2023101218410555700_ref25","article-title":"Relational inductive biases, deep learning, and graph networks","author":"Battaglia","year":"2018","journal-title":"arXiv preprint"},{"key":"2023101218410555700_ref26","article-title":"Learning from protein structure with geometric vector perceptrons","author":"Jing","year":"2021","journal-title":"International Conference on Learning Representations"},{"key":"2023101218410555700_ref27","doi-asserted-by":"crossref","first-page":"D1096","DOI":"10.1093\/nar\/gks966","article-title":"BioLiP: a semi-manually curated database for biologically relevant ligand\u2013protein interactions","volume":"41","author":"Yang","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023101218410555700_ref28","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023101218410555700_ref29","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2023101218410555700_ref30","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1016\/j.str.2011.02.015","article-title":"A critical comparative assessment of predictions of protein-binding sites for biologically relevant organic compounds","volume":"19","author":"Chen","year":"2011","journal-title":"Structure"},{"key":"2023101218410555700_ref31","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1093\/bib\/bbx022","article-title":"Review and comparative assessment of sequence-based predictors of protein-binding residues","volume":"19","author":"Zhang","year":"2018","journal-title":"Brief Bioinform"},{"key":"2023101218410555700_ref32","doi-asserted-by":"crossref","first-page":"W5","DOI":"10.1093\/nar\/gkn201","article-title":"NCBI BLAST: a better web interface","volume":"36","author":"Johnson","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023101218410555700_ref33","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolymers"},{"key":"2023101218410555700_ref34","first-page":"5485","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2023101218410555700_ref35","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/bioinformatics\/btm098","article-title":"UniRef: comprehensive and non-redundant UniProt reference clusters","volume":"23","author":"Suzek","year":"2007","journal-title":"Bioinformatics"},{"key":"2023101218410555700_ref36","first-page":"1263","volume-title":"Proceedings of the 34th International Conferenceon Machine Learning","author":"Gilmer","year":"2017"},{"key":"2023101218410555700_ref37","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1038\/s42003-022-03445-2","article-title":"PepNN: a deep attention model for the identification of peptide binding sites","volume":"5","author":"Abdin","year":"2022","journal-title":"Commun Biol"},{"key":"2023101218410555700_ref38","doi-asserted-by":"crossref","first-page":"2080","DOI":"10.1002\/prot.24100","article-title":"A new size-independent score for pairwise protein structure alignment and its application to structure classification and nucleic-acid binding prediction","volume":"80","author":"Yang","year":"2012","journal-title":"Proteins"},{"key":"2023101218410555700_ref39","doi-asserted-by":"crossref","first-page":"W438","DOI":"10.1093\/nar\/gky439","article-title":"COACH-D: improved protein\u2013ligand binding sites prediction with refined ligand-binding poses through molecular docking","volume":"46","author":"Wu","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023101218410555700_ref40","doi-asserted-by":"crossref","first-page":"930","DOI":"10.1093\/bioinformatics\/bty756","article-title":"Improving the prediction of protein\u2013nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods","volume":"35","author":"Su","year":"2019","journal-title":"Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/6\/bbad360\/52011661\/bbad360.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/6\/bbad360\/52011661\/bbad360.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T18:41:55Z","timestamp":1697136115000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad360\/7306822"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,22]]},"references-count":40,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,9,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad360","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,9,22]]},"article-number":"bbad360"}}