{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T05:06:48Z","timestamp":1773551208615,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T00:00:00Z","timestamp":1750636800000},"content-version":"vor","delay-in-days":22,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,6,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>With the advancement of deep learning, researchers have increasingly proposed computational methods based on deep learning techniques to predict protein function. However, many of these methods treat protein function prediction as a multi-label classification problem, often overlooking the long-tail distribution of functional labels (i.e., Gene Ontology Terms) in datasets. To address this issue, we propose the GOBoost method, which incorporates the proposed long-tail optimization ensemble strategy. Besides, GOBoost introduces the proposed global-local label graph module and multi-granularity focal loss function to enhance long-tail functional information, mitigate the long-tail phenomenon, and improve overall prediction accuracy.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We evaluate GOBoost and other state-of-the-art (SOTA) protein function prediction methods on the PDB and AF2 datasets. The GOBoost outperformed SOTA methods on all evaluation metrics for both datasets. Notably, in the AUPR evaluation on the PDB test set, GOBoost improved by 10.71%, 35.91%, and 22.71% compared to the SOTA HEAL method in the MF, BP, and CC functions. The experimental results show the necessity and superiority of designing models from the label long-tail distribution perspective.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The source code of GOBoost is available at https:\/\/github.com\/Cao-Labs\/GOBoost.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf267","type":"journal-article","created":{"date-parts":[[2025,6,21]],"date-time":"2025-06-21T07:45:16Z","timestamp":1750491916000},"source":"Crossref","is-referenced-by-count":1,"title":["GOBoost: leveraging long-tail gene ontology terms for accurate protein function prediction"],"prefix":"10.1093","volume":"41","author":[{"given":"Lei","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Computer Science and Technology, Anhui University , Hefei, Anhui 230601,","place":["China"]}]},{"given":"Yang","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Anhui University , Hefei, Anhui 230601,","place":["China"]}]},{"given":"Xiao","family":"Chen","sequence":"additional","affiliation":[{"name":"Computer Science Department, Hamilton College , Clinton, NY 13323,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8584-5154","authenticated-orcid":false,"given":"Jie","family":"Hou","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Saint Louis University , Saint Louis, MO 63103,","place":["United States"]}]},{"given":"Dong","family":"Si","sequence":"additional","affiliation":[{"name":"Division of Computing and Software Systems, University of Washington Bothell , Bothell, WA 98011,","place":["United States"]}]},{"given":"Rui","family":"Ding","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Anhui University , Hefei, Anhui 230601,","place":["China"]}]},{"given":"Bo","family":"Jiang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Anhui University , Hefei, Anhui 230601,","place":["China"]}]},{"given":"Hailey","family":"Ledenko","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Pacific Lutheran University , Tacoma, WA 98447,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8345-343X","authenticated-orcid":false,"given":"Renzhi","family":"Cao","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Pacific Lutheran University , Tacoma, WA 98447,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,6,23]]},"reference":[{"key":"2025070408273223700_btaf267-B1","author":"Abdine","year":"2024"},{"key":"2025070408273223700_btaf267-B2","doi-asserted-by":"crossref","first-page":"195","DOI":"10.2165\/00822942-200504030-00004","article-title":"Feature selection and the class imbalance problem in predicting protein function from sequence","volume":"4","author":"Al-Shahib","year":"2005","journal-title":"Appl Bioinformatics"},{"key":"2025070408273223700_btaf267-B3","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2025070408273223700_btaf267-B4","doi-asserted-by":"crossref","first-page":"i318","DOI":"10.1093\/bioinformatics\/btad208","article-title":"Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function","volume":"39","author":"Boadu","year":"2023","journal-title":"Bioinformatics"},{"key":"2025070408273223700_btaf267-B5","doi-asserted-by":"crossref","first-page":"e2300471","DOI":"10.1002\/pmic.202300471","article-title":"Deep learning methods for protein function prediction","volume":"25","author":"Boadu","year":"2025","journal-title":"Proteomics"},{"key":"2025070408273223700_btaf267-B6","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1007\/978-1-59745-535-0_4","volume-title":"Plant Bioinformatics: Methods and Protocols","author":"Boutet","year":"2007"},{"key":"2025070408273223700_btaf267-B7","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1038\/s41592-021-01101-x","article-title":"Sensitive protein alignments at tree-of-life scale using diamond","volume":"18","author":"Buchfink","year":"2021","journal-title":"Nat Methods"},{"key":"2025070408273223700_btaf267-B8","doi-asserted-by":"crossref","first-page":"2825","DOI":"10.1093\/bioinformatics\/btab198","article-title":"Tale: transformer-based protein function annotation with joint sequence\u2013label embedding","volume":"37","author":"Cao","year":"2021","journal-title":"Bioinformatics"},{"key":"2025070408273223700_btaf267-B9","doi-asserted-by":"crossref","first-page":"D482","DOI":"10.1093\/nar\/gky1114","article-title":"Sifts: updated structure integration with function, taxonomy and sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins","volume":"47","author":"Dana","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025070408273223700_btaf267-B10","doi-asserted-by":"crossref","first-page":"3460","DOI":"10.1093\/bioinformatics\/btv398","article-title":"Functional classification of cath superfamilies: a domain-based approach for protein function annotation","volume":"31","author":"Das","year":"2015","journal-title":"Bioinformatics"},{"key":"2025070408273223700_btaf267-B11","first-page":"7112","article-title":"Prottrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar"},{"key":"2025070408273223700_btaf267-B12","doi-asserted-by":"crossref","first-page":"5511","DOI":"10.1038\/s41467-024-49647-6","article-title":"De novo atomic protein structure modeling for cryoem density maps using 3d transformer and hmm","volume":"15","author":"Giri","year":"2024","journal-title":"Nat Commun"},{"key":"2025070408273223700_btaf267-B13","doi-asserted-by":"crossref","first-page":"3168","DOI":"10.1038\/s41467-021-23303-9","article-title":"Structure-based protein function prediction using graph convolutional networks","volume":"12","author":"Gligorijevi\u0107","year":"2021","journal-title":"Nature Communications"},{"key":"2025070408273223700_btaf267-B14","doi-asserted-by":"crossref","first-page":"btad410","DOI":"10.1093\/bioinformatics\/btad410","article-title":"Hierarchical graph transformer with contrastive learning for protein function prediction","volume":"39","author":"Gu","year":"2023","journal-title":"Bioinformatics"},{"key":"2025070408273223700_btaf267-B15","doi-asserted-by":"crossref","first-page":"D1057","DOI":"10.1093\/nar\/gku1113","article-title":"The Goa database: gene ontology annotation updates for 2015","volume":"43","author":"Huntley","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2025070408273223700_btaf267-B16","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with alphafold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2025070408273223700_btaf267-B17","first-page":"65","author":"Jung","year":"2006"},{"key":"2025070408273223700_btaf267-B18","article-title":"Semi-supervised classification with graph convolutional networks","author":"Kipf","year":"2016","journal-title":"arXiv Preprint arXiv : 1609.02907"},{"key":"2025070408273223700_btaf267-B19","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1093\/bioinformatics\/btz595","article-title":"Deepgoplus: improved protein function prediction from sequence","volume":"36","author":"Kulmanov","year":"2020","journal-title":"Bioinformatics"},{"key":"2025070408273223700_btaf267-B20","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1093\/bioinformatics\/btx624","article-title":"Deepgo: predicting protein functions from sequence and interactions using a deep ontology-aware classifier","volume":"34","author":"Kulmanov","year":"2018","journal-title":"Bioinformatics"},{"key":"2025070408273223700_btaf267-B21","doi-asserted-by":"crossref","first-page":"bbab502","DOI":"10.1093\/bib\/bbab502","article-title":"Accurate protein function prediction via graph attention networks with predicted structure information","volume":"23","author":"Lai","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025070408273223700_btaf267-B22","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1109\/TCBB.2022.3170719","article-title":"A deep learning framework for predicting protein functions with co-occurrence of go terms","volume":"20","author":"Li","year":"2023","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2025070408273223700_btaf267-B23","article-title":"Language models of protein sequences at the scale of evolution enable accurate structure prediction","author":"Lin","year":"2022","journal-title":"bioRxiv"},{"key":"2025070408273223700_btaf267-B24","doi-asserted-by":"crossref","first-page":"4526","DOI":"10.1093\/bioinformatics\/btab485","article-title":"Hemdag: a family of modular and scalable hierarchical ensemble methods to improve gene ontology term prediction","volume":"37","author":"Notaro","year":"2021","journal-title":"Bioinformatics"},{"key":"2025070408273223700_btaf267-B25","doi-asserted-by":"crossref","first-page":"e1000431","DOI":"10.1371\/journal.pcbi.1000431","article-title":"The gene ontology\u2019s reference genome project: a unified framework for functional annotation across species","volume":"5","author":"Reference Genome Group of the Gene Ontology Consortium","year":"2009","journal-title":"PLoS Computational Biology"},{"key":"2025070408273223700_btaf267-B26","article-title":"Normalization: a preprocessing stage","author":"Patro","year":"2015","journal-title":"arXiv Preprint arXiv : 1503.06462"},{"key":"2025070408273223700_btaf267-B27","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1109\/TCBB.2021.3130923","article-title":"A Sub-sequence based approach to protein function prediction via multi-attention based multi-aspect network","volume":"20","author":"Ranjan","year":"2023","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2025070408273223700_btaf267-B28","author":"Ridnik","year":"2021"},{"key":"2025070408273223700_btaf267-B29","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"2025070408273223700_btaf267-B30","first-page":"2980","author":"Ross","year":"2017"},{"key":"2025070408273223700_btaf267-B31","doi-asserted-by":"crossref","first-page":"2542","DOI":"10.1038\/s41467-018-04964-5","article-title":"Clustering huge protein sequence sets in linear time","volume":"9","author":"Steinegger","year":"2018","journal-title":"Nat Commun"},{"key":"2025070408273223700_btaf267-B32","doi-asserted-by":"crossref","first-page":"e1005324","DOI":"10.1371\/journal.pcbi.1005324","article-title":"Accurate de novo prediction of protein contact map by ultra-deep learning model","volume":"13","author":"Wang","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2025070408273223700_btaf267-B33","first-page":"649","author":"Ye","year":"2020"},{"key":"2025070408273223700_btaf267-B34","doi-asserted-by":"crossref","first-page":"bbad117","DOI":"10.1093\/bib\/bbad117","article-title":"Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion","volume":"24","author":"Yuan","year":"2023","journal-title":"Brief Bioinform"},{"key":"2025070408273223700_btaf267-B35","author":"Zhang","year":"2023"},{"key":"2025070408273223700_btaf267-B36","doi-asserted-by":"crossref","first-page":"lqac004","DOI":"10.1093\/nargab\/lqac004","article-title":"Panda2: protein function prediction using graph neural networks","volume":"4","author":"Zhao","year":"2022","journal-title":"NAR Genom Bioinform"},{"key":"2025070408273223700_btaf267-B37","doi-asserted-by":"crossref","first-page":"lqae094","DOI":"10.1093\/nargab\/lqae094","article-title":"Panda-3d: protein function prediction based on alphafold models","volume":"6","author":"Zhao","year":"2024","journal-title":"NAR Genom Bioinform"},{"key":"2025070408273223700_btaf267-B38","author":"Zhou","year":"2016"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf267\/63552642\/btaf267.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf267\/63552642\/btaf267.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/6\/btaf267\/63552642\/btaf267.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,4]],"date-time":"2025-07-04T08:27:44Z","timestamp":1751617664000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf267\/8171988"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,6]]},"references-count":38,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf267","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.11.16.623961","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,6]]},"published":{"date-parts":[[2025,6]]},"article-number":"btaf267"}}