{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T00:07:24Z","timestamp":1775606844202,"version":"3.50.1"},"reference-count":61,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2021,12,9]],"date-time":"2021-12-09T00:00:00Z","timestamp":1639008000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01GM089753"],"award-info":[{"award-number":["R01GM089753"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,17]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Experimental protein function annotation does not scale with the fast-growing sequence databases. Only a tiny fraction (&amp;lt;0.1%) of protein sequences has experimentally determined functional annotations. Computational methods may predict protein function very quickly, but their accuracy is not very satisfactory. Based upon recent breakthroughs in protein structure prediction and protein language models, we develop GAT-GO, a graph attention network (GAT) method that may substantially improve protein function prediction by leveraging predicted structure information and protein sequence embedding. Our experimental results show that GAT-GO greatly outperforms the latest sequence- and structure-based deep learning methods. On the PDB-mmseqs testset where the train and test proteins share &amp;lt;15% sequence identity, our GAT-GO yields Fmax (maximum F-score) 0.508, 0.416, 0.501, and area under the precision-recall curve (AUPRC) 0.427, 0.253, 0.411 for the MFO, BPO, CCO ontology domains, respectively, much better than the homology-based method BLAST (Fmax 0.117, 0.121, 0.207 and AUPRC 0.120, 0.120, 0.163) that does not use any structure information. On the PDB-cdhit testset where the training and test proteins are more similar, although using predicted structure information, our GAT-GO obtains Fmax 0.637, 0.501, 0.542 for the MFO, BPO, CCO ontology domains, respectively, and AUPRC 0.662, 0.384, 0.481, significantly exceeding the just-published method DeepFRI that uses experimental structures, which has Fmax 0.542, 0.425, 0.424 and AUPRC only 0.313, 0.159, 0.193.<\/jats:p>","DOI":"10.1093\/bib\/bbab502","type":"journal-article","created":{"date-parts":[[2021,11,5]],"date-time":"2021-11-05T08:15:48Z","timestamp":1636100148000},"source":"Crossref","is-referenced-by-count":115,"title":["Accurate protein function prediction via graph attention networks with predicted structure information"],"prefix":"10.1093","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4201-6786","authenticated-orcid":false,"given":"Boqiao","family":"Lai","sequence":"first","affiliation":[{"name":"Toyota Technological Institute at Chicago, Chicago, IL 60637, USA"}]},{"given":"Jinbo","family":"Xu","sequence":"additional","affiliation":[{"name":"Toyota Technological Institute at Chicago, Chicago, IL 60637, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,12,9]]},"reference":[{"key":"2022030514391754500_ref1","doi-asserted-by":"crossref","first-page":"2699","DOI":"10.1093\/nar\/gky092","article-title":"UniProt: the universal protein knowledgebase","volume":"46","author":"Consortium, U., Others","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2022030514391754500_ref2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1835-8","article-title":"Others: the CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens","volume":"20","author":"Zhou","year":"2019","journal-title":"Genome Biol"},{"key":"2022030514391754500_ref3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-016-1037-6","article-title":"Others: an expanded evaluation of protein function prediction methods shows an improvement in accuracy","volume":"17","author":"Jiang","year":"2016","journal-title":"Genome Biol"},{"key":"2022030514391754500_ref4","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"Others: a large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat Methods"},{"key":"2022030514391754500_ref5","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0198216","article-title":"Predicting human protein function with multi-task deep neural networks","volume":"13","author":"Fa","year":"2018","journal-title":"PLoS One"},{"key":"2022030514391754500_ref6","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1093\/bioinformatics\/btz595","article-title":"DeepGOPlus: improved protein function prediction from sequence","volume":"36","author":"Kulmanov","year":"2020","journal-title":"Bioinformatics"},{"key":"2022030514391754500_ref7","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1093\/bioinformatics\/btx624","article-title":"DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier","volume":"34","author":"Kulmanov","year":"2018","journal-title":"Bioinformatics"},{"key":"2022030514391754500_ref8","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1016\/j.ymeth.2018.05.026","article-title":"DeepText2GO: improving large-scale protein function prediction with deep semantic text representation","volume":"145","author":"You","year":"2018","journal-title":"Methods"},{"key":"2022030514391754500_ref9","doi-asserted-by":"crossref","first-page":"2465","DOI":"10.1093\/bioinformatics\/bty130","article-title":"GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank","volume":"34","author":"You","year":"2018","journal-title":"Bioinformatics"},{"key":"2022030514391754500_ref10","article-title":"Annotating gene ontology terms for protein sequences with the transformer model","author":"Duong","year":"2020","journal-title":"bioRxiv"},{"key":"2022030514391754500_ref11","doi-asserted-by":"crossref","first-page":"391","DOI":"10.3389\/fbioe.2020.00391","article-title":"SDN2GO: an integrated deep learning model for protein function prediction","volume":"8","author":"Cai","year":"2020","journal-title":"Front Bioeng Biotechnol"},{"key":"2022030514391754500_ref12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/srep31865","article-title":"FFPred 3: feature-based function prediction for all gene ontology domains","volume":"6","author":"Cozzetto","year":"2016","journal-title":"Sci Rep"},{"key":"2022030514391754500_ref13","doi-asserted-by":"crossref","first-page":"W379","DOI":"10.1093\/nar\/gkz388","article-title":"NetGO: improving large-scale protein function prediction with massive network information","volume":"47","author":"You","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022030514391754500_ref14","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1016\/j.bbamcr.2010.01.022","article-title":"Moonlighting proteins: an intriguing mode of multitasking","volume":"1803","author":"Huberts","year":"2010","journal-title":"Biochim Biophys Acta"},{"key":"2022030514391754500_ref15","article-title":"Structure-based function prediction using graph convolutional networks","volume":"1","author":"Gligorijevic","year":"2021","journal-title":"Nature communications"},{"key":"2022030514391754500_ref16","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1016\/j.jmb.2008.12.072","article-title":"Predicting protein function and binding profile via matching of local evolutionary and geometric surface patterns","volume":"387","author":"Tseng","year":"2009","journal-title":"J Mol Biol"},{"key":"2022030514391754500_ref17","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1093\/molbev\/msj048","article-title":"Estimation of amino acid residue substitution rates at local spatial regions and application in protein function inference: a Bayesian Monte Carlo approach","volume":"23","author":"Tseng","year":"2006","journal-title":"Mol Biol Evol"},{"key":"2022030514391754500_ref18","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1007\/s10969-011-9109-z","article-title":"Accuracy of functional surfaces on comparatively modeled protein structures","volume":"12","author":"Zhao","year":"2011","journal-title":"J Struct Funct Genomics"},{"key":"2022030514391754500_ref19","doi-asserted-by":"crossref","first-page":"W555","DOI":"10.1093\/nar\/gkh390","article-title":"pvSOAR: detecting similar surface patterns of pocket and void surfaces of amino acid residues on proteins","volume":"32","author":"Binkowski","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2022030514391754500_ref20","doi-asserted-by":"crossref","first-page":"D351","DOI":"10.1093\/nar\/gky1100","article-title":"Others: InterPro in 2019: improving coverage, classification and access to protein sequence annotations","volume":"47","author":"Mitchell","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022030514391754500_ref21","doi-asserted-by":"crossref","first-page":"D289","DOI":"10.1093\/nar\/gkw1098","article-title":"CATH: an expanded resource to predict protein function through structure and sequence","volume":"45","author":"Dawson","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2022030514391754500_ref22","doi-asserted-by":"crossref","first-page":"717","DOI":"10.1093\/bioinformatics\/btm006","article-title":"On the relationship between sequence and structure similarities in proteomics","volume":"23","author":"Krissinel","year":"2007","journal-title":"Bioinformatics"},{"key":"2022030514391754500_ref23","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1093\/protein\/12.2.85","article-title":"Twilight zone of protein sequence alignments","volume":"12","author":"Rost","year":"1999","journal-title":"Protein Eng"},{"key":"2022030514391754500_ref24","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1016\/S0076-6879(96)66039-X","article-title":"Understanding protein structure: using scop for fold interpretation","volume":"266","author":"Brenner","year":"1996","journal-title":"Methods Enzymol"},{"key":"2022030514391754500_ref25","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1126\/science.273.5275.595","article-title":"Mapping the protein universe","volume":"273","author":"Holm","year":"1996","journal-title":"Science"},{"key":"2022030514391754500_ref26","doi-asserted-by":"crossref","first-page":"2889","DOI":"10.1093\/bioinformatics\/btw473","article-title":"Functional classification of CATH superfamilies: a domain-based approach for protein function annotation","volume":"32","author":"Das","year":"2016","journal-title":"Bioinformatics"},{"key":"2022030514391754500_ref27","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1005324","article-title":"Accurate de novo prediction of protein contact map by ultra-deep learning model","volume":"13","author":"Wang","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2022030514391754500_ref28","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1038\/s41586-019-1923-7","article-title":"Others: improved protein structure prediction using potentials from deep learning","volume":"577","author":"Senior","year":"2020","journal-title":"Nature"},{"key":"2022030514391754500_ref29","doi-asserted-by":"crossref","first-page":"16856","DOI":"10.1073\/pnas.1821309116","article-title":"Distance-based protein folding powered by deep learning","volume":"116","author":"Xu","year":"2019","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2022030514391754500_ref30","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv Neural Inf Process Syst"},{"key":"2022030514391754500_ref31","first-page":"770","volume-title":"In: Proceedings of the IEEE conference on computer vision and pattern recognition","author":"He","year":"2016"},{"key":"2022030514391754500_ref32","article-title":"X.: predicting epigenomic functions of genetic variants in the context of neurodevelopment via deep transfer learning","author":"Lai","year":"2021","journal-title":"bioRxiv"},{"key":"2022030514391754500_ref33","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning--based sequence model","volume":"12","author":"Zhou","year":"2015","journal-title":"Nat Methods"},{"key":"2022030514391754500_ref34","doi-asserted-by":"crossref","first-page":"990","DOI":"10.1101\/gr.200535.115","article-title":"Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks","volume":"26","author":"Kelley","year":"2016","journal-title":"Genome Res"},{"key":"2022030514391754500_ref35","first-page":"7099","article-title":"DeepCLIP: predicting the effect of mutations on protein--RNA binding with deep learning","volume":"48","author":"Gr\u00f8nning","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022030514391754500_ref36","doi-asserted-by":"crossref","first-page":"1496","DOI":"10.1073\/pnas.1914677117","article-title":"Improved protein structure prediction using predicted interresidue orientations","volume":"117","author":"Yang","year":"2020","journal-title":"Proc Natl Acad Sci"},{"key":"2022030514391754500_ref37","article-title":"Semi-supervised classification with graph convolutional networks","author":"Kipf","year":"2016","journal-title":"arXiv preprint"},{"key":"2022030514391754500_ref38","article-title":"Spectral networks and locally connected networks on graphs","author":"Bruna","year":"2013","journal-title":"arXiv"},{"key":"2022030514391754500_ref39","article-title":"Deep convolutional networks on graph-structured data","author":"Henaff","year":"2015","journal-title":"arXiv"},{"key":"2022030514391754500_ref40","article-title":"Graph attention networks","author":"Veli\u010dkovi\u0107","year":"2017","journal-title":"arXiv preprint"},{"key":"2022030514391754500_ref41","article-title":"Neural machine translation by jointly learning to align and translate","author":"Bahdanau","year":"2014","journal-title":"arXiv preprint"},{"key":"2022030514391754500_ref42","article-title":"Chromatin interaction aware gene regulatory modeling with graph attention networks","author":"Karbalayghareh","year":"2021","journal-title":"bioRxiv"},{"key":"2022030514391754500_ref43","doi-asserted-by":"crossref","DOI":"10.1101\/2020.12.10.419994","volume-title":"Fast and effective protein model refinement by deep graph neural networks","author":"Jing","year":"2020"},{"key":"2022030514391754500_ref44","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2016239118","article-title":"Others: biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci"},{"key":"2022030514391754500_ref45","article-title":"Unified rational protein engineering with sequence-only deep representation learning","volume":"16.12","author":"Alley","year":"2019","journal-title":"Nature methods"},{"key":"2022030514391754500_ref46","article-title":"Progen: language modeling for protein generation","author":"Madani","year":"2020","journal-title":"arXiv"},{"key":"2022030514391754500_ref47","article-title":"Improved protein structure prediction by deep learning irrespective of co-evolution information","volume":"1\u20139","author":"Xu","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2022030514391754500_ref48","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2022030514391754500_ref49","article-title":"Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function","volume":"37.2","author":"Villegas-Morcillo","year":"2021","journal-title":"Bioinformatics"},{"key":"2022030514391754500_ref50","first-page":"3734","volume-title":"Proceedings of the 36th International Conference on Machine Learning","author":"Lee","year":"2019"},{"key":"2022030514391754500_ref51","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-019-2932-0","article-title":"ProteinNet: a standardized data set for machine learning of protein structure","volume":"20","author":"AlQuraishi","year":"2019","journal-title":"BMC Bioinform"},{"key":"2022030514391754500_ref52","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2022030514391754500_ref53","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2022030514391754500_ref54","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1186\/s12859-019-3019-7","article-title":"HH-suite3 for fast remote homology detection and deep protein annotation","volume":"20","author":"Steinegger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2022030514391754500_ref55","doi-asserted-by":"crossref","first-page":"18962","DOI":"10.1038\/srep18962","article-title":"Protein secondary structure prediction using deep convolutional neural fields","volume":"6","author":"Wang","year":"2016","journal-title":"Sci Rep"},{"key":"2022030514391754500_ref56","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","article-title":"UniProt consortium: UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches","volume":"31","author":"Suzek","year":"2015","journal-title":"Bioinformatics"},{"key":"2022030514391754500_ref57","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1038\/s41592-019-0496-6","article-title":"Machine-learning-guided directed evolution for protein engineering","volume":"16","author":"Yang","year":"2019","journal-title":"Nat Methods"},{"key":"2022030514391754500_ref58","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/MSP.2017.2693418","article-title":"Geometric deep learning: going beyond Euclidean data","volume":"34","author":"Bronstein","year":"2017","journal-title":"IEEE Signal Process Mag"},{"key":"2022030514391754500_ref59","volume-title":"Decoupled Weight Decay Regularization","author":"Loshchilov","year":"2017"},{"key":"2022030514391754500_ref60","volume-title":"Fast Graph Representation Learning with PyTorch Geometric","author":"Fey","year":"2019"},{"key":"2022030514391754500_ref61","volume-title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","author":"Paszke","year":"2019"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab502\/42747040\/bbab502.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab502\/42747040\/bbab502.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,12]],"date-time":"2023-11-12T03:52:13Z","timestamp":1699761133000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab502\/6457163"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,9]]},"references-count":61,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1,17]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab502","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.06.16.448727","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,1]]},"published":{"date-parts":[[2021,12,9]]},"article-number":"bbab502"}}