{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T01:40:23Z","timestamp":1769823623736,"version":"3.49.0"},"reference-count":18,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2025,9,14]],"date-time":"2025-09-14T00:00:00Z","timestamp":1757808000000},"content-version":"vor","delay-in-days":13,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Institute for Information & communications Technology Planning & Evaluation"},{"name":"Korea government","award":["II212068"],"award-info":[{"award-number":["II212068"]}]},{"name":"Artificial Intelligence Innovation Hub","award":["00220628"],"award-info":[{"award-number":["00220628"]}]},{"name":"Artificial intelligence","award":["II201373"],"award-info":[{"award-number":["II201373"]}]},{"name":"Artificial Intelligence Graduate School Program, Hanyang University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Accurately predicting protein function from sequence remains a fundamental yet challenging goal in computational biology. Although recent advances have enabled the reliable prediction of protein 3D structures from sequences, utilizing structural information alone for functional inference has shown limited success. To address this gap, previous work has explored the integration of sequence and structural data by representing proteins as graphs, where residues are modeled as nodes, and spatial proximity defines edges. However, since the number of amino acids can vary significantly between proteins, the resulting graphs, constructed based on amino acids, also differ greatly in size. This large variation poses a challenge, as it becomes extremely difficult to extract generalizable information from graphs of such differing scales accurately. In this work, we propose Structure-guided Sequence Representation Learning, a novel framework that incorporates structural knowledge to extract informative, multiscale features directly from protein sequences. By embedding structural information into a sequence-based learning paradigm, our method captures functionally meaningful representations more effectively. Furthermore, we present a generalizable model architecture designed for multitask learning and inference, offering improved performance and flexibility over traditional task-specific approaches to protein function prediction.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this article, we demonstrate that the proposed novel attention pooling method on protein graphs effectively integrates global structural features and local chemical properties of amino acids in various-length proteins. Through this approach, we improve performance in tasks related to predicting protein functions, functional expression sites, and their relationships with structure and sequence. By effectively extracting the information needed to predict multiple protein functions simultaneously, we improve efficiency by eliminating the need for separate learning.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The code implementation is available at https:\/\/github.com\/vanha9\/S2RL_protein and has also been archived on zenodo: https:\/\/doi.org\/10.5281\/zenodo.16441001.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf511","type":"journal-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T19:26:57Z","timestamp":1758310017000},"source":"Crossref","is-referenced-by-count":1,"title":["Structure-guided sequence representation learning for generalizable protein function prediction"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-7161-7989","authenticated-orcid":false,"given":"SeokJun","family":"On","sequence":"first","affiliation":[{"name":"Department of Artificial Intelligence, Hanyang University , Seoul 04763,","place":["Republic of Korea"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5831-2932","authenticated-orcid":false,"given":"Yujin","family":"Jeong","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence, Hanyang University , Seoul 04763,","place":["Republic of Korea"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-1167-539X","authenticated-orcid":false,"given":"Eun-Sol","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence, Hanyang University , Seoul 04763,","place":["Republic of Korea"]},{"name":"Department of Computer Science, Hanyang University , Seoul 04763,","place":["Republic of Korea"]}]}],"member":"286","published-online":{"date-parts":[[2025,9,14]]},"reference":[{"key":"2025092914362130900_btaf511-B1","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped blast and psi-blast: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2025092914362130900_btaf511-B2","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1126\/science.abj8754","article-title":"Accurate prediction of protein structures and interactions using a three-track neural network","volume":"373","author":"Baek","year":"2021","journal-title":"Science"},{"key":"2025092914362130900_btaf511-B3","author":"Boadu","year":"2025"},{"key":"2025092914362130900_btaf511-B4","doi-asserted-by":"publisher","first-page":"3460","DOI":"10.1093\/bioinformatics\/btv398","article-title":"Functional classification of cath superfamilies: a domain-based approach for protein function annotation","volume":"31","author":"Das","year":"2015","journal-title":"Bioinformatics"},{"key":"2025092914362130900_btaf511-B5","first-page":"2083","author":"Gao","year":"2019"},{"key":"2025092914362130900_btaf511-B6","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-021-23303-9","volume-title":"Nat Comm","author":"Gligorijevic","year":"2021"},{"key":"2025092914362130900_btaf511-B7","doi-asserted-by":"publisher","first-page":"btad410","DOI":"10.1093\/bioinformatics\/btad410","article-title":"Hierarchical graph transformer with contrastive learning for protein function prediction","volume":"39","author":"Gu","year":"2023","journal-title":"Bioinformatics"},{"key":"2025092914362130900_btaf511-B8","doi-asserted-by":"publisher","first-page":"btad637","DOI":"10.1093\/bioinformatics\/btad637","article-title":"Struct2go: protein function prediction based on graph pooling algorithm and alphafold2 structure information","volume":"39","author":"Jiao","year":"2023","journal-title":"Bioinformatics"},{"key":"2025092914362130900_btaf511-B9","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with alphafold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2025092914362130900_btaf511-B10","doi-asserted-by":"publisher","author":"Kipf","DOI":"10.48550\/arXiv.1609.02907"},{"key":"2025092914362130900_btaf511-B11","doi-asserted-by":"publisher","first-page":"660","DOI":"10.1093\/bioinformatics\/btx624","article-title":"Deepgo: predicting protein functions from sequence and interactions using a deep ontology-aware classifier","volume":"34","author":"Kulmanov","year":"2017","journal-title":"Bioinformatics"},{"key":"2025092914362130900_btaf511-B12","doi-asserted-by":"publisher","first-page":"bbab502","DOI":"10.1093\/bib\/bbab502","article-title":"Accurate protein function prediction via graph attention networks with predicted structure information","volume":"23","author":"Lai","year":"2021","journal-title":"Brief Bioinform"},{"key":"2025092914362130900_btaf511-B13","doi-asserted-by":"publisher","author":"Lee","year":"2019","DOI":"10.48550\/arXiv.1904.08082"},{"key":"2025092914362130900_btaf511-B14","doi-asserted-by":"crossref","first-page":"e70182","DOI":"10.1002\/pro.70182","article-title":"Gobeacon: an ensemble model for protein function prediction enhanced by contrastive learning","volume":"34","author":"Lin","year":"2025","journal-title":"Protein Sci"},{"key":"2025092914362130900_btaf511-B15","doi-asserted-by":"publisher","first-page":"btae571","DOI":"10.1093\/bioinformatics\/btae571","article-title":"Tawfn: a deep learning framework for protein function prediction","volume":"40","author":"Meng","year":"2024","journal-title":"Bioinformatics"},{"key":"2025092914362130900_btaf511-B16","doi-asserted-by":"publisher","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025092914362130900_btaf511-B17","doi-asserted-by":"publisher","author":"Veli\u010dkovi\u0107","year":"2018","DOI":"10.48550\/arXiv.1710.10903"},{"key":"2025092914362130900_btaf511-B18","doi-asserted-by":"publisher","author":"Ying","year":"2018","DOI":"10.48550\/arXiv.1806.08804"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf511\/64263392\/btaf511.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/9\/btaf511\/64263392\/btaf511.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/9\/btaf511\/64263392\/btaf511.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T18:36:31Z","timestamp":1759170991000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf511\/8253735"}},"subtitle":[],"editor":[{"given":"Jianlin","family":"Cheng","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,9,1]]},"references-count":18,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2025,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf511","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,9]]},"published":{"date-parts":[[2025,9,1]]},"article-number":"btaf511"}}