{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T14:30:39Z","timestamp":1762957839130,"version":"3.45.0"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T00:00:00Z","timestamp":1762905600000},"content-version":"vor","delay-in-days":11,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Interdisciplinary Innovative Talents Foundation from Renmin Hospital of Wuhan University","award":["JCRCZN-2022-018"],"award-info":[{"award-number":["JCRCZN-2022-018"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["2042024YXA002"],"award-info":[{"award-number":["2042024YXA002"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62272354","U23A20318","62276195"],"award-info":[{"award-number":["62272354","U23A20318","62276195"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Innovative Research Group Project of Hubei Province","award":["2024AFA017"],"award-info":[{"award-number":["2024AFA017"]}]},{"name":"Science and Technology Major Project of Hubei Province","award":["2024BAB046"],"award-info":[{"award-number":["2024BAB046"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Accurately identifying and classifying protein active sites is crucial for understanding protein mechanisms, drug design, and synthetic biology. Current methods often rely on binary classification and single-modal data, limiting their scope. To address these limitations, we propose M$^{3}$Site, a multimodal framework that integrates protein sequence embeddings, structural graph representations, and functional text annotations for residue-level, multiclass active site prediction. Built upon a curated dataset of 25 883 proteins sourced from UniProt and AlphaFold2, M$^{3}$Site leverages pretrained protein language models, equivariant graph neural networks, and biomedical language models for feature extraction. The function informed cross-attention module enables cross-modal feature fusion, while the adaptive weighted fusion mechanism balances modality contributions. A compound loss function tackles class imbalance, ensuring robust performance. Experimental results show M$^{3}$Site significantly outperforms existing models, and an interactive application has been developed to enhance its practical utility for predictions and visualizations. The dataset, source code for experiments, and interactive application are publicly available at https:\/\/github.com\/Gift-OYS\/M3Site.<\/jats:p>","DOI":"10.1093\/bib\/bbaf590","type":"journal-article","created":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T14:27:13Z","timestamp":1762957633000},"source":"Crossref","is-referenced-by-count":0,"title":["M3Site: multiclass multimodal learning for protein active site identification and classification"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-5693-7384","authenticated-orcid":false,"given":"Song","family":"Ouyang","sequence":"first","affiliation":[{"name":"Renmin Hospital of Wuhan University , Zhang Road and Jiefang Road, Wuhan, Hubei 430060,","place":["China"]},{"name":"School of Computer Science , National Engineering Research Center for Multimedia Software, Wuhan University, Bayi Road, Wuhan, Hubei 430072,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2296-6370","authenticated-orcid":false,"given":"Yong","family":"Luo","sequence":"additional","affiliation":[{"name":"School of Computer Science , National Engineering Research Center for Multimedia Software, Wuhan University, Bayi Road, Wuhan, Hubei 430072,","place":["China"]}]},{"given":"Huiyu","family":"Cai","sequence":"additional","affiliation":[{"name":"BioGeometry , North Haidian 2nd Street, Beijing 100083,","place":["China"]},{"name":"Mila - Qu\u00e9bec AI Institute , Rue Saint-Urbain, Montr\u00e9al, Qu\u00e9bec H2S 3H1,","place":["Canada"]},{"name":"Department of Computer Science and Operations Research , Universit\u00e9 de Montr\u00e9al, Bd \u00c9douard-Montpetit, Montr\u00e9al, Qu\u00e9bec H3T 1J4,","place":["Canada"]}]},{"given":"Kehua","family":"Su","sequence":"additional","affiliation":[{"name":"School of Computer Science , National Engineering Research Center for Multimedia Software, Wuhan University, Bayi Road, Wuhan, Hubei 430072,","place":["China"]}]},{"given":"Fei","family":"Liao","sequence":"additional","affiliation":[{"name":"Renmin Hospital of Wuhan University , Zhang Road and Jiefang Road, Wuhan, Hubei 430060,","place":["China"]}]},{"given":"Na","family":"Zhan","sequence":"additional","affiliation":[{"name":"Renmin Hospital of Wuhan University , Zhang Road and Jiefang Road, Wuhan, Hubei 430060,","place":["China"]}]},{"given":"Huangxuan","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science , National Engineering Research Center for Multimedia Software, Wuhan University, Bayi Road, Wuhan, Hubei 430072,","place":["China"]}]},{"given":"Tailang","family":"Yin","sequence":"additional","affiliation":[{"name":"Renmin Hospital of Wuhan University , Zhang Road and Jiefang Road, Wuhan, Hubei 430060,","place":["China"]}]},{"given":"Lin","family":"Zhao","sequence":"additional","affiliation":[{"name":"Peking Union Medical College Hospital , No. 1 Shuaifuyuan Wangfujing Dongcheng District, Beijing 100730,","place":["China"]}]},{"given":"Dongjing","family":"Shan","sequence":"additional","affiliation":[{"name":"Southwest Medical University , Zhongshan Road, Luzhou, Sichuan 646000,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,11,12]]},"reference":[{"key":"2025111209270896700_ref1","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/j.cbpa.2003.11.001","article-title":"Searching for functional sites in protein structures","volume":"8","author":"Jones","year":"2004","journal-title":"Curr Opin Chem Biol"},{"key":"2025111209270896700_ref2","doi-asserted-by":"publisher","first-page":"775","DOI":"10.1093\/protein\/5.8.775","article-title":"Analysis of several key active site residues of ricin A chain by mutagenesis and X-ray crystallography","volume":"5","author":"Kim","year":"1992","journal-title":"Protein Eng Des Sel"},{"key":"2025111209270896700_ref3","doi-asserted-by":"publisher","first-page":"1106","DOI":"10.1021\/acs.accounts.5b00001","article-title":"Dynamics of protein kinases: insights from nuclear magnetic resonance","volume":"48","author":"Xiao","year":"2015","journal-title":"Acc Chem Res"},{"key":"2025111209270896700_ref4","doi-asserted-by":"publisher","first-page":"1136","DOI":"10.1038\/s41422-020-00432-2","article-title":"Resolving individual atoms of protein complex by cryo-electron microscopy","volume":"30","author":"Zhang","year":"2020","journal-title":"Cell Res"},{"key":"2025111209270896700_ref5","doi-asserted-by":"publisher","first-page":"617","DOI":"10.1093\/bioinformatics\/btq008","article-title":"Active site prediction using evolutionary and structural information","volume":"26","author":"Sankararaman","year":"2010","journal-title":"Bioinformatics"},{"key":"2025111209270896700_ref6","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1556\/1848.2021.00315","article-title":"Protein active site prediction for early drug discovery and designing","volume":"13","author":"Yousaf","year":"2021","journal-title":"Int Rev Appl Sci Eng"},{"key":"2025111209270896700_ref7","article-title":"Intrinsic-extrinsic convolution and pooling for learning on 3D protein structures","author":"Hermosilla","journal-title":"International Conference on Learning Representations"},{"key":"2025111209270896700_ref8","article-title":"Protein representation learning by geometric structure pretraining","volume-title":"International Conference on Learning Representations","author":"Zhang","year":"2023"},{"key":"2025111209270896700_ref9","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbae055","article-title":"TransGCN: a semi-supervised graph convolution network-based framework to infer protein translocations in spatio-temporal proteomics","volume":"25","author":"Wang","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025111209270896700_ref10","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2025111209270896700_ref11","doi-asserted-by":"crossref","first-page":"2102","DOI":"10.1093\/bioinformatics\/btac020","article-title":"ProteinBERT: a universal deep-learning model of protein sequence and function","volume":"38","author":"Brandes","year":"2022","journal-title":"Bioinformatics"},{"key":"2025111209270896700_ref12","doi-asserted-by":"crossref","first-page":"bbaf026","DOI":"10.1093\/bib\/bbaf026","article-title":"TargetCLP: clathrin proteins prediction combining transformed and evolutionary scale modeling-based multi-view features via weighted feature integration approach","volume":"26","author":"Ullah","year":"2025","journal-title":"Brief Bioinform"},{"key":"2025111209270896700_ref13","doi-asserted-by":"crossref","first-page":"bbae664","DOI":"10.1093\/bib\/bbae664","article-title":"Combining evolution and protein language models for an interpretable cancer driver mutation prediction with d2deep","volume":"26","author":"Tzavella","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025111209270896700_ref14","doi-asserted-by":"crossref","first-page":"bbaf016","DOI":"10.1093\/bib\/bbaf016","article-title":"Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences","volume":"26","author":"Basu","year":"2025","journal-title":"Brief Bioinform"},{"key":"2025111209270896700_ref15","doi-asserted-by":"crossref","first-page":"1681","DOI":"10.1093\/bioinformatics\/btab009","article-title":"Deepsurf: a surface-based deep learning approach for the prediction of ligand binding sites on proteins","volume":"37","author":"Mylonas","year":"2021","journal-title":"Bioinformatics"},{"key":"2025111209270896700_ref16","doi-asserted-by":"crossref","first-page":"btad718","DOI":"10.1093\/bioinformatics\/btad718","article-title":"DeepProSite: structure-aware protein binding site prediction using esmfold and pretrained language model","volume":"39","author":"Fang","year":"2023","journal-title":"Bioinformatics"},{"key":"2025111209270896700_ref17","doi-asserted-by":"crossref","first-page":"bbae330","DOI":"10.1093\/bib\/bbae330","article-title":"EGPDI: identifying protein\u2013DNA binding sites based on multi-view graph embedding fusion","volume":"25","author":"Zheng","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025111209270896700_ref18","doi-asserted-by":"publisher","first-page":"bbae040","DOI":"10.1101\/2023.05.30.542787","article-title":"ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein\u2013DNA binding site prediction","volume":"25","author":"Zhu","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025111209270896700_ref19","doi-asserted-by":"crossref","first-page":"183904","DOI":"10.1007\/s11704-023-2640-9","article-title":"Protein acetylation sites with complex-valued polynomial model","volume":"18","author":"Bao","year":"2024","journal-title":"Front Comp Sci"},{"key":"2025111209270896700_ref20","doi-asserted-by":"crossref","first-page":"1277121","DOI":"10.3389\/fmicb.2023.1277121","article-title":"Oral_voting_transfer: classification of oral microorganisms\u2019 function proteins with voting transfer model","volume":"14","author":"Bao","year":"2024","journal-title":"Front Microbiol"},{"key":"2025111209270896700_ref21","article-title":"MMSite: a multi-modal framework for the identification of active sites in proteins","volume-title":"The Thirty-eighth Annual Conference on Neural Information Processing Systems","author":"Ouyang","year":"2024"},{"key":"2025111209270896700_ref22","doi-asserted-by":"crossref","first-page":"btac793","DOI":"10.1093\/bioinformatics\/btac793","article-title":"Annotation of biologically relevant ligands in UniProtKB using ChEBI","volume":"39","author":"Coudert","year":"2023","journal-title":"Bioinformatics"},{"key":"2025111209270896700_ref23","doi-asserted-by":"crossref","first-page":"D523","DOI":"10.1093\/nar\/gkac1052","article-title":"Uniprot: the universal protein knowledgebase in 2023","volume":"51","author":"The UniProt Consortium","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025111209270896700_ref24","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with alphafold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2025111209270896700_ref25","doi-asserted-by":"crossref","first-page":"D368","DOI":"10.1093\/nar\/gkad1011","article-title":"Alphafold protein structure database in 2024: providing structure coverage for over 214 million protein sequences","volume":"52","author":"Varadi","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025111209270896700_ref26","article-title":"E(n) equivariant graph neural networks","volume-title":"International Conference on Machine Learning","author":"Satorras","year":"2021"},{"key":"2025111209270896700_ref27","article-title":"Gradio: Hassle-free sharing and testing of ml models in the wild","author":"Abid","year":"2019"},{"key":"2025111209270896700_ref28"},{"key":"2025111209270896700_ref29","doi-asserted-by":"crossref","first-page":"3029","DOI":"10.1093\/bioinformatics\/btab184","article-title":"Fast and sensitive taxonomic assignment to metagenomic contigs","volume":"37","author":"Mirdita","year":"2021","journal-title":"Bioinformatics"},{"key":"2025111209270896700_ref30","doi-asserted-by":"crossref","first-page":"btae269","DOI":"10.1093\/bioinformatics\/btae269","article-title":"MEG-PPIS: a fast protein\u2013protein interaction site prediction method based on multi-scale graph information and equivariant graph neural network","volume":"40","author":"Ding","year":"2024","journal-title":"Bioinformatics"},{"key":"2025111209270896700_ref31","doi-asserted-by":"crossref","first-page":"1528","DOI":"10.1016\/j.bpj.2015.08.015","article-title":"MDTraj: a modern open library for the analysis of molecular dynamics trajectories","volume":"109","author":"McGibbon","year":"2015","journal-title":"Biophys J"},{"key":"2025111209270896700_ref32","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-46478-7_31","article-title":"A discriminative feature learning approach for deep face recognition","volume-title":"European Conference on Computer Vision","author":"Wen","year":"2016"},{"key":"2025111209270896700_ref33","doi-asserted-by":"crossref","first-page":"107852","DOI":"10.1016\/j.patcog.2021.107852","article-title":"Learning deep discriminative embeddings via joint rescaled features and log-probability centers","volume":"114","author":"Cai","year":"2021","journal-title":"Pattern Recognit"},{"key":"2025111209270896700_ref34","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025111209270896700_ref35","article-title":"Language models enable zero-shot prediction of the effects of mutations on protein function","author":"Meier","journal-title":"Thirty-Fifth Annual Conference on Neural Information Processing Systems"},{"key":"2025111209270896700_ref36","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","journal-title":"Science"},{"key":"2025111209270896700_ref37","doi-asserted-by":"crossref","DOI":"10.1126\/science.ads0018","article-title":"Simulating 500 million years of evolution with a language model","volume":"387","author":"Hayes","year":"2025","journal-title":"Science"},{"key":"2025111209270896700_ref38","first-page":"92","article-title":"PETA: evaluating the impact of protein transfer learning with sub-word tokenization on downstream applications","volume":"16","author":"Tan","year":"2024","journal-title":"J Chem"},{"key":"2025111209270896700_ref39","doi-asserted-by":"crossref","first-page":"2404212","DOI":"10.1002\/advs.202404212","article-title":"S-PLM: structure-aware protein language model via contrastive learning between sequence and structure","volume":"12","author":"Wang","year":"2025","journal-title":"Adv Sci"},{"key":"2025111209270896700_ref40","article-title":"Evaluating protein transfer learning with tape","author":"Rao","journal-title":"Thirty-Third Annual Conference on Neural Information Processing Systems"},{"key":"2025111209270896700_ref41","doi-asserted-by":"crossref","first-page":"gzad015","DOI":"10.1093\/protein\/gzad015","article-title":"Masked inverse folding with sequence transfer for protein representation learning","volume":"36","author":"Yang","year":"2023","journal-title":"Protein Eng Des Sel"},{"key":"2025111209270896700_ref42","article-title":"Endowing protein language models with structural knowledge","author":"Chen","year":"2024"},{"key":"2025111209270896700_ref43","first-page":"1","article-title":"Domain-specific language model pretraining for biomedical natural language processing","volume":"3","author":"Yu","year":"2021","journal-title":"ACM Transactions on Computing for Healthcare"},{"key":"2025111209270896700_ref44","article-title":"Visualizing data using t-SNE","volume":"9","author":"Van der Maaten","year":"2008","journal-title":"J Mach Learn Res"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/6\/bbaf590\/65275890\/bbaf590.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/6\/bbaf590\/65275890\/bbaf590.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T14:27:18Z","timestamp":1762957638000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf590\/8321763"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,1]]},"references-count":44,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf590","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,11]]},"published":{"date-parts":[[2025,11,1]]},"article-number":"bbaf590"}}