{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T11:39:04Z","timestamp":1768909144893,"version":"3.49.0"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2025,5,6]],"date-time":"2025-05-06T00:00:00Z","timestamp":1746489600000},"content-version":"vor","delay-in-days":5,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2021YFA0910700"],"award-info":[{"award-number":["2021YFA0910700"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32470704"],"award-info":[{"award-number":["32470704"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies","award":["2022B1212010005"],"award-info":[{"award-number":["2022B1212010005"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,5,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>In recent years, protein function prediction has broken through the bottleneck of sequence features, significantly improving prediction accuracy using high-precision protein structures predicted by AlphaFold2. While single-species protein function prediction methods have achieved remarkable success, multi-species approaches still face challenges such as difficulties in multi-source data integration and insufficient knowledge transfer between distantly-related species. How to integrate large-scale data and provide effective cross-species label propagation for species with sparse protein annotations remains a critical and unresolved challenge. To address this problem, we propose the MSNGO (Multi-species protein Structures and Network to predict GO terms) model, which integrates structural features and network propagation methods. Our validation shows that using structural features can significantly improve the accuracy of multi-species protein function prediction.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We employ graph representation learning techniques to extract amino acid representations from protein structure contact maps and train a structural model using a graph convolution pooling module to derive protein-level structural features. After incorporating the sequence features from ESM-2, we apply a network propagation algorithm to aggregate information and update node representations within a heterogeneous network. The results demonstrate that MSNGO outperforms previous multi-species protein function prediction methods that rely on sequence features and protein-protein networks.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/blingbell\/MSNGO.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf285","type":"journal-article","created":{"date-parts":[[2025,5,6]],"date-time":"2025-05-06T16:25:04Z","timestamp":1746548704000},"source":"Crossref","is-referenced-by-count":2,"title":["MSNGO: multi-species protein function annotation based on 3D protein structure and network propagation"],"prefix":"10.1093","volume":"41","author":[{"given":"Beibei","family":"Wang","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) , Shenzhen, Guangdong 518055,","place":["China"]}]},{"given":"Boyue","family":"Cui","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) , Shenzhen, Guangdong 518055,","place":["China"]}]},{"given":"Shiqu","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) , Shenzhen, Guangdong 518055,","place":["China"]}]},{"given":"Xuan","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) , Shenzhen, Guangdong 518055,","place":["China"]},{"name":"Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Harbin Institute of Technology (Shenzhen) , Shenzhen, Guangdong 518055,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6500-6217","authenticated-orcid":false,"given":"Yadong","family":"Wang","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics, Faculty of Computing, Harbin Institute of Technology , Harbin, Heilongjiang 150001,","place":["China"]},{"name":"Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology , Harbin, Heilongjiang 150001,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8045-5264","authenticated-orcid":false,"given":"Junyi","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) , Shenzhen, Guangdong 518055,","place":["China"]},{"name":"Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Harbin Institute of Technology (Shenzhen) , Shenzhen, Guangdong 518055,","place":["China"]},{"name":"Key Laboratory of Biological Bigdata, Ministry of Education, Harbin Institute of Technology , Harbin, Heilongjiang 150001,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,5,6]]},"reference":[{"key":"2025052922563906900_btaf285-B1","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat Genet"},{"key":"2025052922563906900_btaf285-B2","doi-asserted-by":"crossref","first-page":"2414","DOI":"10.1093\/bioinformatics\/btab098","article-title":"NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity","volume":"37","author":"Barot","year":"2021","journal-title":"Bioinformatics"},{"key":"2025052922563906900_btaf285-B3","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1038\/s41592-021-01101-x","article-title":"Sensitive protein alignments at tree-of-life scale using DIAMOND","volume":"18","author":"Buchfink","year":"2021","journal-title":"Nat Methods"},{"key":"2025052922563906900_btaf285-B4","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1016\/j.tig.2013.09.005","article-title":"CAFA and the open world of protein function predictions","volume":"29","author":"Dessimoz","year":"2013","journal-title":"Trends Genet"},{"key":"2025052922563906900_btaf285-B5","doi-asserted-by":"crossref","first-page":"3168","DOI":"10.1038\/s41467-021-23303-9","article-title":"Structure-based protein function prediction using graph convolutional networks","volume":"12","author":"Gligorijevic","year":"2021","journal-title":"Nat Commun"},{"key":"2025052922563906900_btaf285-B6","author":"Grover"},{"key":"2025052922563906900_btaf285-B7","author":"Hie"},{"key":"2025052922563906900_btaf285-B8","doi-asserted-by":"crossref","first-page":"D1057","DOI":"10.1093\/nar\/gku1113","article-title":"The GOA database: gene ontology annotation updates for 2015","volume":"43","author":"Huntley","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2025052922563906900_btaf285-B9","doi-asserted-by":"crossref","first-page":"btad637","DOI":"10.1093\/bioinformatics\/btad637","article-title":"Struct2GO: protein function prediction based on graph pooling algorithm and AlphaFold2 structure information","volume":"39","author":"Jiao","year":"2023","journal-title":"Bioinformatics"},{"key":"2025052922563906900_btaf285-B10","author":"Jing"},{"key":"2025052922563906900_btaf285-B11","doi-asserted-by":"crossref","first-page":"1236","DOI":"10.1093\/bioinformatics\/btu031","article-title":"InterProScan 5: genome-scale protein function classification","volume":"30","author":"Jones","year":"2014","journal-title":"Bioinformatics"},{"key":"2025052922563906900_btaf285-B12","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2025052922563906900_btaf285-B13","author":"Kipf"},{"key":"2025052922563906900_btaf285-B14","doi-asserted-by":"crossref","first-page":"btae655","DOI":"10.1093\/bioinformatics\/btae655","article-title":"InterLabelGO+: unraveling label correlations in protein function prediction","volume":"40","author":"Liu","year":"2024","journal-title":"Bioinformatics"},{"key":"2025052922563906900_btaf285-B15","doi-asserted-by":"crossref","first-page":"1705","DOI":"10.1038\/s42003-024-07411-y","article-title":"Annotating protein functions via fusing multiple biological modalities","volume":"7","author":"Ma","year":"2024","journal-title":"Commun Biol"},{"key":"2025052922563906900_btaf285-B16","author":"Perozzi","year":"2014"},{"key":"2025052922563906900_btaf285-B17","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025052922563906900_btaf285-B18","doi-asserted-by":"crossref","first-page":"12763","DOI":"10.1073\/pnas.0806627105","article-title":"Global alignment of multiple protein interaction networks with application to functional orthology detection","volume":"105","author":"Singh","year":"2008","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025052922563906900_btaf285-B19","doi-asserted-by":"crossref","first-page":"D607","DOI":"10.1093\/nar\/gky1131","article-title":"STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets","volume":"47","author":"Szklarczyk","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025052922563906900_btaf285-B20","doi-asserted-by":"crossref","first-page":"1050","DOI":"10.1038\/s42256-021-00419-7","article-title":"Protein function prediction for newly sequenced organisms","volume":"3","author":"Torres","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2025052922563906900_btaf285-B21","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"UniProt","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025052922563906900_btaf285-B22","doi-asserted-by":"crossref","first-page":"D439","DOI":"10.1093\/nar\/gkab1061","article-title":"AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models","volume":"50","author":"Varadi","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025052922563906900_btaf285-B23","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1038\/s41467-024-54816-8","article-title":"DPFunc: accurately predicting protein function via deep learning with domain-guided structure information","volume":"16","author":"Wang","year":"2025","journal-title":"Nat Commun"},{"key":"2025052922563906900_btaf285-B24","doi-asserted-by":"crossref","first-page":"1713","DOI":"10.1109\/TCBB.2022.3215257","article-title":"PSPGO: cross-species heterogeneous network propagation for protein function prediction","volume":"20","author":"Wu","year":"2023","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2025052922563906900_btaf285-B25","doi-asserted-by":"crossref","first-page":"i262","DOI":"10.1093\/bioinformatics\/btab270","article-title":"DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction","volume":"37","author":"You","year":"2021","journal-title":"Bioinformatics"},{"key":"2025052922563906900_btaf285-B26","doi-asserted-by":"crossref","first-page":"bbad117","DOI":"10.1093\/bib\/bbad117","article-title":"Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion","volume":"24","author":"Yuan","year":"2023","journal-title":"Brief Bioinform"},{"key":"2025052922563906900_btaf285-B27","doi-asserted-by":"crossref","first-page":"bbad243","DOI":"10.1093\/bib\/bbad243","article-title":"Large-scale predicting protein functions through heterogeneous feature fusion","volume":"24","author":"Zheng","year":"2023","journal-title":"Brief Bioinform"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf285\/63069972\/btaf285.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/5\/btaf285\/63069972\/btaf285.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/5\/btaf285\/63069972\/btaf285.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T02:56:44Z","timestamp":1748573804000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf285\/8125805"}},"subtitle":[],"editor":[{"given":"Macha","family":"Nikolski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,5]]},"references-count":27,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,5,6]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf285","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,5]]},"published":{"date-parts":[[2025,5]]},"article-number":"btaf285"}}