{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T04:05:39Z","timestamp":1776398739225,"version":"3.51.2"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2023,4,13]],"date-time":"2023-04-13T00:00:00Z","timestamp":1681344000000},"content-version":"vor","delay-in-days":12,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,4,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Protein representation learning methods have shown great potential to many downstream tasks in biological applications. A few recent studies have demonstrated that the self-supervised learning is a promising solution to addressing insufficient labels of proteins, which is a major obstacle to effective protein representation learning. However, existing protein representation learning is usually pretrained on protein sequences without considering the important protein structural information.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this work, we propose a novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins. In particular, a graph neural network model is pretrained to preserve the protein structural information with self-supervised tasks from a pairwise residue distance perspective and a dihedral angle perspective, respectively. Furthermore, we propose to leverage the available protein language model pretrained on protein sequences to enhance the self-supervised learning. Specifically, we identify the relation between the sequential information in the protein language model and the structural information in the specially designed graph neural network model via a novel pseudo bi-level optimization scheme. We conduct experiments on three downstream tasks: the binary classification into membrane\/non-membrane proteins, the location classification into 10 cellular compartments, and the enzyme-catalyzed reaction classification into 384 EC numbers, and these experiments verify the effectiveness of our proposed method.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The Alphafold2 database is available in https:\/\/alphafold.ebi.ac.uk\/. The PDB files are available in https:\/\/www.rcsb.org\/. The downstream tasks are available in https:\/\/github.com\/phermosilla\/IEConv\\_proteins\/tree\/master\/Datasets. The code of the proposed method is available in https:\/\/github.com\/GGchen1997\/STEPS_Bioinformatics.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad189","type":"journal-article","created":{"date-parts":[[2023,4,13]],"date-time":"2023-04-13T15:33:50Z","timestamp":1681400030000},"source":"Crossref","is-referenced-by-count":45,"title":["Structure-aware protein self-supervised learning"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4244-055X","authenticated-orcid":false,"given":"Can (Sam)","family":"Chen","sequence":"first","affiliation":[{"name":"School of Computer Science, McGill University , 845 Rue Sherbrooke O , Montreal, Quebec H3A 0G4, Canada"},{"name":"MILA\u2014Quebec AI Institute , 6666 Rue Saint-Urbain , Montreal, Quebec H2S 3H1, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2677-7021","authenticated-orcid":false,"given":"Jingbo","family":"Zhou","sequence":"additional","affiliation":[{"name":"Baidu Research , Xibeiwang East Road, Haidian District , Beijing 100193, China"}]},{"given":"Fan","family":"Wang","sequence":"additional","affiliation":[{"name":"Baidu Inc. , Xuefu Road East, Nanshan District , Shenzhen 518000, China"}]},{"given":"Xue","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science, McGill University , 845 Rue Sherbrooke O , Montreal, Quebec H3A 0G4, Canada"}]},{"given":"Dejing","family":"Dou","sequence":"additional","affiliation":[{"name":"BCG X , Level 22, West Tower, Genesis Beijing 8 Xinyuan South Road, Chaoyang District , Beijing 100027, China"}]}],"member":"286","published-online":{"date-parts":[[2023,4,13]]},"reference":[{"key":"2023042723052060800_btad189-B1","doi-asserted-by":"crossref","first-page":"4049","DOI":"10.1093\/bioinformatics\/btx548","article-title":"DeepLoc: prediction of protein subcellular localization using deep learning","volume":"33","author":"Almagro Armenteros","year":"2017","journal-title":"Bioinformatics"},{"key":"2023042723052060800_btad189-B2","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1126\/science.181.4096.223","article-title":"Principles that govern the folding of protein chains","volume":"181","author":"Anfinsen","year":"1973","journal-title":"Science"},{"key":"2023042723052060800_btad189-B3","author":"Bepler","year":"2019"},{"key":"2023042723052060800_btad189-B4","doi-asserted-by":"crossref","first-page":"654","DOI":"10.1016\/j.cels.2021.05.017","article-title":"Learning the protein language: evolution, structure, and function","volume":"12","author":"Bepler","year":"2021","journal-title":"Cell Syst"},{"key":"2023042723052060800_btad189-B5","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1038\/d41586-020-00341-9","article-title":"Revolutionary cryo-EM is taking over structural biology","volume":"578","author":"Callaway","year":"2020","journal-title":"Nature"},{"key":"2023042723052060800_btad189-B6","author":"Chen","year":"2022"},{"key":"2023042723052060800_btad189-B7","author":"Chen","year":"2022"},{"key":"2023042723052060800_btad189-B8","author":"Chen","year":"2022"},{"key":"2023042723052060800_btad189-B9","author":"Chen","year":"2023"},{"key":"2023042723052060800_btad189-B10","author":"Chen","year":"2021"},{"key":"2023042723052060800_btad189-B11","doi-asserted-by":"crossref","first-page":"e1000470","DOI":"10.1371\/journal.pcbi.1000470","article-title":"Four distances between pairs of amino acids provide a precise description of their interaction","volume":"5","author":"Cohen","year":"2009","journal-title":"PLoS Comput Biol"},{"key":"2023042723052060800_btad189-B12","author":"Dodge","year":"2020"},{"key":"2023042723052060800_btad189-B13","author":"Elnaggar","year":"2021"},{"key":"2023042723052060800_btad189-B14","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1038\/s42256-021-00438-4","article-title":"Geometry-enhanced molecular representation learning for property prediction","volume":"4","author":"Fang","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2023042723052060800_btad189-B15","doi-asserted-by":"crossref","first-page":"3168","DOI":"10.1038\/s41467-021-23303-9","article-title":"Structure-based protein function prediction using graph convolutional networks","volume":"12","author":"Gligorijevi\u0107","year":"2021","journal-title":"Nat Commun"},{"key":"2023042723052060800_btad189-B16","author":"Hermosilla","year":"2020"},{"key":"2023042723052060800_btad189-B17","author":"Hospedales","year":"2020"},{"key":"2023042723052060800_btad189-B18","doi-asserted-by":"crossref","first-page":"1295","DOI":"10.1093\/bioinformatics\/btx780","article-title":"DeepSF: deep convolutional neural network for mapping protein sequences to folds","volume":"34","author":"Hou","year":"2018","journal-title":"Bioinformatics"},{"key":"2023042723052060800_btad189-B19","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023042723052060800_btad189-B20","article-title":"f-GAN: training generative neural samplers using variational divergence minimization","volume":"29","author":"Nowozin","year":"2016","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023042723052060800_btad189-B21","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat Methods"},{"key":"2023042723052060800_btad189-B22","first-page":"9689","article-title":"Evaluating protein transfer learning with TAPE","volume":"32","author":"Rao","year":"2019","journal-title":"Proc. Adv. Neur. Inf. Proc. Syst (NeurIPS)"},{"key":"2023042723052060800_btad189-B23","author":"Rao","year":"2020"},{"key":"2023042723052060800_btad189-B24","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023042723052060800_btad189-B25","author":"Somnath","year":"2021"},{"key":"2023042723052060800_btad189-B26","first-page":"68","author":"S\u00f8nderby","year":"2015"},{"key":"2023042723052060800_btad189-B27","author":"Townshend","year":"2019"},{"key":"2023042723052060800_btad189-B28","author":"Vig"},{"key":"2023042723052060800_btad189-B29","author":"Wang","year":"2023"},{"key":"2023042723052060800_btad189-B30","author":"Wang","year":"2018"},{"key":"2023042723052060800_btad189-B31","first-page":"1873","author":"Xia","year":"2021"},{"key":"2023042723052060800_btad189-B32","author":"Xu","year":"2018"},{"key":"2023042723052060800_btad189-B33","author":"Zhang","year":"2023"},{"key":"2023042723052060800_btad189-B34","author":"Zhang"},{"key":"2023042723052060800_btad189-B35","author":"Zhou","year":"2023"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad189\/49880136\/btad189.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/4\/btad189\/50117092\/btad189.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/4\/btad189\/50117092\/btad189.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,27]],"date-time":"2023-04-27T23:06:11Z","timestamp":1682636771000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad189\/7117544"}},"subtitle":[],"editor":[{"given":"Lenore","family":"Cowen","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,4,1]]},"references-count":35,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,4,3]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad189","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,4,1]]},"published":{"date-parts":[[2023,4,1]]},"article-number":"btad189"}}