{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,30]],"date-time":"2026-05-30T00:57:15Z","timestamp":1780102635171,"version":"3.54.0"},"reference-count":57,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2023,3,24]],"date-time":"2023-03-24T00:00:00Z","timestamp":1679616000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Guangzhou S&T Research Plan","award":["202002020047"],"award-info":[{"award-number":["202002020047"]}]},{"name":"Guangzhou S&T Research Plan","award":["202007030010"],"award-info":[{"award-number":["202007030010"]}]},{"name":"Guangdong Key Field R&D Plan","award":["2018B0101090060"],"award-info":[{"award-number":["2018B0101090060"]}]},{"name":"Guangdong Key Field R&D Plan","award":["2019B020228001"],"award-info":[{"award-number":["2019B020228001"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["12126610"],"award-info":[{"award-number":["12126610"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"publisher","award":["2022YFF1203100"],"award-info":[{"award-number":["2022YFF1203100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,5,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Protein function prediction is an essential task in bioinformatics which benefits disease mechanism elucidation and drug target discovery. Due to the explosive growth of proteins in sequence databases and the diversity of their functions, it remains challenging to fast and accurately predict protein functions from sequences alone. Although many methods have integrated protein structures, biological networks or literature information to improve performance, these extra features are often unavailable for most proteins. Here, we propose SPROF-GO, a Sequence-based alignment-free PROtein Function predictor, which leverages a pretrained language model to efficiently extract informative sequence embeddings and employs self-attention pooling to focus on important residues. The prediction is further advanced by exploiting the homology information and accounting for the overlapping communities of proteins with related functions through the label diffusion algorithm. SPROF-GO was shown to surpass state-of-the-art sequence-based and even network-based approaches by more than 14.5, 27.3 and 10.1% in area under the precision-recall curve on the three sub-ontology test sets, respectively. Our method was also demonstrated to generalize well on non-homologous proteins and unseen species. Finally, visualization based on the attention mechanism indicated that SPROF-GO is able to capture sequence domains useful for function prediction. The datasets, source codes and trained models of SPROF-GO are available at https:\/\/github.com\/biomed-AI\/SPROF-GO. The SPROF-GO web server is freely available at http:\/\/bio-web1.nscc-gz.cn\/app\/sprof-go.<\/jats:p>","DOI":"10.1093\/bib\/bbad117","type":"journal-article","created":{"date-parts":[[2023,3,25]],"date-time":"2023-03-25T11:06:21Z","timestamp":1679742381000},"source":"Crossref","is-referenced-by-count":95,"title":["Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion"],"prefix":"10.1093","volume":"24","author":[{"given":"Qianmu","family":"Yuan","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering at Sun Yat-sen University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Junjie","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering at Sun Yat-sen University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiancong","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering at Sun Yat-sen University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Huiying","family":"Zhao","sequence":"additional","affiliation":[{"name":"Sun Yat-sen Memorial Hospital at Sun Yat-sen University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuedong","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering at Sun Yat-sen University"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2023,3,24]]},"reference":[{"key":"2023052021595722600_ref1","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1038\/35015694","article-title":"Protein function in the post-genomic era","volume":"405","author":"Eisenberg","year":"2000","journal-title":"Nature"},{"key":"2023052021595722600_ref2","doi-asserted-by":"crossref","first-page":"aaf1420","DOI":"10.1126\/science.aaf1420","article-title":"A global genetic interaction network maps a wiring diagram of cellular function","volume":"353","author":"Costanzo","year":"2016","journal-title":"Science"},{"key":"2023052021595722600_ref3","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"The UniProt Consortium","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023052021595722600_ref4","first-page":"55","article-title":"Protein function prediction, functional","author":"Cruz","year":"2017","journal-title":"Genomics"},{"key":"2023052021595722600_ref5","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat Methods"},{"key":"2023052021595722600_ref6","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat Genet"},{"key":"2023052021595722600_ref7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2008-9-s1-s6","article-title":"Consistent probabilistic outputs for protein function prediction","volume":"9","author":"Obozinski","year":"2008","journal-title":"Genome Biol"},{"key":"2023052021595722600_ref8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-016-1037-6","article-title":"An expanded evaluation of protein function prediction methods shows an improvement in accuracy","volume":"17","author":"Jiang","year":"2016","journal-title":"Genome Biol"},{"key":"2023052021595722600_ref9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1835-8","article-title":"The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens","volume":"20","author":"Zhou","year":"2019","journal-title":"Genome Biol"},{"key":"2023052021595722600_ref10","doi-asserted-by":"crossref","first-page":"3674","DOI":"10.1093\/bioinformatics\/bti610","article-title":"Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research","volume":"21","author":"Conesa","year":"2005","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref11","doi-asserted-by":"crossref","first-page":"2465","DOI":"10.1093\/bioinformatics\/bty130","article-title":"GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank","volume":"34","author":"You","year":"2018","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref12","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023052021595722600_ref13","doi-asserted-by":"crossref","first-page":"1236","DOI":"10.1093\/bioinformatics\/btu031","article-title":"InterProScan 5: genome-scale protein function classification","volume":"30","author":"Jones","year":"2014","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref14","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1093\/bioinformatics\/btz595","article-title":"DeepGOPlus: improved protein function prediction from sequence","volume":"36","author":"Kulmanov","year":"2020","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref15","doi-asserted-by":"crossref","first-page":"2825","DOI":"10.1093\/bioinformatics\/btab198","article-title":"TALE: transformer-based protein function annotation with joint sequence\u2013label embedding","volume":"37","author":"Cao","year":"2021","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-021-23303-9","article-title":"Structure-based protein function prediction using graph convolutional networks","volume":"12","author":"Gligorijevi\u0107","year":"2021","journal-title":"Nat Commun"},{"key":"2023052021595722600_ref17","doi-asserted-by":"crossref","first-page":"bbab502","DOI":"10.1093\/bib\/bbab502","article-title":"Accurate protein function prediction via graph attention networks with predicted structure information","volume":"23","author":"Lai","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023052021595722600_ref18","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1038\/35001165","article-title":"Guilt-by-association goes global","volume":"403","author":"Oliver","year":"2000","journal-title":"Nature"},{"key":"2023052021595722600_ref19","doi-asserted-by":"crossref","first-page":"W379","DOI":"10.1093\/nar\/gkz388","article-title":"NetGO: improving large-scale protein function prediction with massive network information","volume":"47","author":"You","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023052021595722600_ref20","doi-asserted-by":"crossref","first-page":"D605","DOI":"10.1093\/nar\/gkaa1074","article-title":"The STRING database in 2021: customizable protein\u2013protein networks, and functional characterization of user-uploaded gene\/measurement sets","volume":"49","author":"Szklarczyk","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023052021595722600_ref21","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1093\/bioinformatics\/btx624","article-title":"DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier","volume":"34","author":"Kulmanov","year":"2018","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref22","doi-asserted-by":"crossref","first-page":"1050","DOI":"10.1038\/s42256-021-00419-7","article-title":"Protein function prediction for newly sequenced organisms","volume":"3","author":"Torres","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2023052021595722600_ref23","doi-asserted-by":"crossref","first-page":"i262","DOI":"10.1093\/bioinformatics\/btab270","article-title":"DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction","volume":"37","author":"You","year":"2021","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref24","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1016\/j.ymeth.2018.05.026","article-title":"DeepText2GO: improving large-scale protein function prediction with deep semantic text representation","volume":"145","author":"You","year":"2018","journal-title":"Methods"},{"key":"2023052021595722600_ref25","doi-asserted-by":"crossref","first-page":"W469","DOI":"10.1093\/nar\/gkab398","article-title":"NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information","volume":"49","author":"Yao","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023052021595722600_ref26","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci"},{"key":"2023052021595722600_ref27","article-title":"ProtTrans: towards cracking the language of lifes code through self-supervised deep learning and high performance computing","volume":"44","author":"Elnaggar","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2023052021595722600_ref28","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1038\/s42256-022-00457-9","article-title":"Learning functional properties of proteins with language models","volume":"4","author":"Unsal","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2023052021595722600_ref29","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbac444","article-title":"Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning","volume":"23","author":"Yuan","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023052021595722600_ref30","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1038\/nrg.2017.38","article-title":"Network propagation: a universal amplifier of genetic associations","volume":"18","author":"Cowen","year":"2017","journal-title":"Nat Rev Genet"},{"key":"2023052021595722600_ref31","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/978-1-4939-3167-5_2","volume-title":"Plant Bioinformatics","author":"Boutet","year":"2016"},{"key":"2023052021595722600_ref32","doi-asserted-by":"crossref","first-page":"D1057","DOI":"10.1093\/nar\/gku1113","article-title":"The GOA database: gene ontology annotation updates for 2015","volume":"43","author":"Huntley","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023052021595722600_ref33","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J Mach Learn Res"},{"key":"2023052021595722600_ref34","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1038\/s41592-019-0437-4","article-title":"Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold","volume":"16","author":"Steinegger","year":"2019","journal-title":"Nat Methods"},{"key":"2023052021595722600_ref35","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/bioinformatics\/btm098","article-title":"UniRef: comprehensive and non-redundant UniProt reference clusters","volume":"23","author":"Suzek","year":"2007","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref36","first-page":"4171","volume-title":"Proceedings of NAACL-HLT","author":"Kenton","year":"2019"},{"key":"2023052021595722600_ref37","article-title":"Layer normalization","author":"Ba"},{"key":"2023052021595722600_ref38","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J Mach Learn Res"},{"key":"2023052021595722600_ref39","first-page":"9662","volume-title":"Advances in Neural Information Processing Systems","author":"Giunchiglia","year":"2020"},{"key":"2023052021595722600_ref40","doi-asserted-by":"crossref","first-page":"e1008453","DOI":"10.1371\/journal.pcbi.1008453","article-title":"DeepPheno: predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier","volume":"16","author":"Kulmanov","year":"2020","journal-title":"PLoS Comput Biol"},{"key":"2023052021595722600_ref41","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nmeth.3176","article-title":"Fast and sensitive protein alignment using DIAMOND","volume":"12","author":"Buchfink","year":"2015","journal-title":"Nat Methods"},{"key":"2023052021595722600_ref42","volume-title":"3rd International Conference on Learning Representations (Poster)","author":"Kingma","year":"2015"},{"key":"2023052021595722600_ref43","first-page":"8026","article-title":"Pytorch: an imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023052021595722600_ref44","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1145\/1143844.1143874","volume-title":"Proceedings of the 23rd International Conference on Machine learning","author":"Davis","year":"2006"},{"key":"2023052021595722600_ref45","doi-asserted-by":"crossref","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PloS One"},{"key":"2023052021595722600_ref46","doi-asserted-by":"crossref","first-page":"i238","DOI":"10.1093\/bioinformatics\/btac256","article-title":"DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms","volume":"38","author":"Kulmanov","year":"2022","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref47","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1016\/j.neuron.2004.12.005","article-title":"Identification of PSD-95 palmitoylating enzymes","volume":"44","author":"Fukata","year":"2004","journal-title":"Neuron"},{"key":"2023052021595722600_ref48","doi-asserted-by":"crossref","first-page":"1650","DOI":"10.1101\/gad.4.10.1650","article-title":"Activity and tissue-specific expression of the transcription factor NF-E1 multigene family","volume":"4","author":"Yamamoto","year":"1990","journal-title":"Genes Dev"},{"key":"2023052021595722600_ref49","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1093\/bioinformatics\/btab643","article-title":"Structure-aware protein\u2013protein interaction site prediction using deep graph convolutional network","volume":"38","author":"Yuan","year":"2021","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref50","doi-asserted-by":"crossref","first-page":"bbab564","DOI":"10.1093\/bib\/bbab564","article-title":"AlphaFold2-aware protein\u2013DNA binding site prediction using graph transformer","volume":"23","author":"Yuan","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023052021595722600_ref51","doi-asserted-by":"crossref","first-page":"D439","DOI":"10.1093\/nar\/gkab1061","article-title":"AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models","volume":"50","author":"Varadi","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023052021595722600_ref52","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume-title":"Science","author":"Lin","year":"2023"},{"key":"2023052021595722600_ref53","doi-asserted-by":"crossref","first-page":"e1010793","DOI":"10.1371\/journal.pcbi.1010793","article-title":"Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction","volume":"18","author":"Zhu","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2023052021595722600_ref54","first-page":"1597","volume-title":"International Conference on Machine Learning","author":"Chen","year":"2020"},{"key":"2023052021595722600_ref55","doi-asserted-by":"crossref","first-page":"bbaa344","DOI":"10.1093\/bib\/bbaa344","article-title":"PharmKG: a dedicated knowledge graph benchmark for bomedical data mining","volume":"22","author":"Zheng","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023052021595722600_ref56","doi-asserted-by":"crossref","first-page":"4488","DOI":"10.1093\/bioinformatics\/btac536","article-title":"Hierarchical deep learning for predicting GO annotations by integrating protein knowledge","volume":"38","author":"Merino","year":"2022","journal-title":"Bioinformatics"},{"key":"2023052021595722600_ref57","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1142\/9789811258589_0009","article-title":"Sequence-based predictions of residues that bind proteins and peptides","volume-title":"Machine Learning in Bioinformatics of Protein Sequences","author":"Yuan","year":"2023"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/3\/bbad117\/50410866\/bbad117.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/3\/bbad117\/50410866\/bbad117.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,20]],"date-time":"2023-05-20T22:01:43Z","timestamp":1684620103000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad117\/7085635"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,24]]},"references-count":57,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,5,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad117","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,5]]},"published":{"date-parts":[[2023,3,24]]},"article-number":"bbad117"}}