{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T00:07:24Z","timestamp":1775606844070,"version":"3.50.1"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2022,6,27]],"date-time":"2022-06-27T00:00:00Z","timestamp":1656288000000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Office of Sponsored Research","award":["URF\/1\/4355-01-01"],"award-info":[{"award-number":["URF\/1\/4355-01-01"]}]},{"name":"Office of Sponsored Research","award":["URF\/1\/4675-01-01"],"award-info":[{"award-number":["URF\/1\/4675-01-01"]}]},{"name":"Office of Sponsored Research","award":["FCC\/1\/1976-34-01"],"award-info":[{"award-number":["FCC\/1\/1976-34-01"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,6,24]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50 000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require a significant amount of training data and cannot make predictions for GO classes that have only few or no experimental annotations.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>http:\/\/github.com\/bio-ontology-research-group\/deepgozero.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac256","type":"journal-article","created":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T11:10:15Z","timestamp":1649934615000},"page":"i238-i245","source":"Crossref","is-referenced-by-count":78,"title":["DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1710-1820","authenticated-orcid":false,"given":"Maxat","family":"Kulmanov","sequence":"first","affiliation":[{"name":"Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8149-5890","authenticated-orcid":false,"given":"Robert","family":"Hoehndorf","sequence":"additional","affiliation":[{"name":"Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia"}]}],"member":"286","published-online":{"date-parts":[[2022,6,27]]},"reference":[{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2023041407562013000_","volume-title":"The Description Logic Handbook: Theory, Implementation and Applications","author":"Baader","year":"2003"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1126\/science.abj8754","article-title":"Accurate prediction of protein structures and interactions using a three-track neural network","volume":"373","author":"Baek","year":"2021","journal-title":"Science"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nmeth.3176","article-title":"Fast and sensitive protein alignment using diamond","volume":"12","author":"Buchfink","year":"2015","journal-title":"Nat. Methods"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"2825","DOI":"10.1093\/bioinformatics\/btab198","article-title":"TALE: transformer-based protein function annotation with joint sequence\u2013label embedding","volume":"37","author":"Cao","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041407562013000_","first-page":"233","author":"Davis","year":"2006"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"baab069","DOI":"10.1093\/database\/baab069","article-title":"OBO foundry in 2021: operationalizing open data principles to evaluate ontologies","volume":"2021","author":"Jackson","year":"2021","journal-title":"Database"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1186\/s13059-016-1037-6","article-title":"An expanded evaluation of protein function prediction methods shows an improvement in accuracy","volume":"17","author":"Jiang","year":"2016","journal-title":"Genome Biol"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023041407562013000_","author":"Kingma","year":"2014"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1093\/bioinformatics\/btz595","article-title":"DeepGOPlus: improved protein function prediction from sequence","volume":"36","author":"Kulmanov","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1093\/bioinformatics\/btx624","article-title":"DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier","volume":"34","author":"Kulmanov","year":"2018","journal-title":"Bioinformatics"},{"key":"2023041407562013000_","first-page":"6103","author":"Kulmanov","year":"2019"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"bbaa199","DOI":"10.1093\/bib\/bbaa199","article-title":"Semantic similarity and machine learning with ontologies","volume":"22","author":"Kulmanov","year":"2021","journal-title":"Brief. Bioinformatics"},{"key":"2023041407562013000_","author":"Mendez","year":"2012"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"460","DOI":"10.1007\/978-3-540-45210-2_42","volume-title":"Computer Aided Systems Theory - EUROCAST 2003","author":"Mira","year":"2003"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"1236","DOI":"10.1093\/bioinformatics\/btu031","article-title":"InterProScan 5: genome-scale protein function classification","volume":"30","author":"Mitchell","year":"2014","journal-title":"Bioinformatics"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"11769343211050067","DOI":"10.1177\/11769343211050067","article-title":"Sequence-based prediction of plant protein-protein interactions by combining discrete sine transformation with rotation forest","volume":"17","author":"Pan","year":"2021","journal-title":"Evol. Bioinform. Online"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"i53","DOI":"10.1093\/bioinformatics\/btt228","article-title":"Information-theoretic evaluation of predicted ontological annotations","volume":"29","author":"Radivojac","year":"2013","journal-title":"Bioinformatics"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat. Methods"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"969","DOI":"10.1016\/j.cels.2021.08.010","article-title":"D-script translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions","volume":"12","author":"Sledzieski","year":"2021","journal-title":"Cell Syst"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2005-6-5-r46","article-title":"Relations in biomedical ontologies","volume":"6","author":"Smith","year":"2005","journal-title":"Genome Biol"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"1251","DOI":"10.1038\/nbt1346","article-title":"The OBO foundry: coordinated evolution of ontologies to support biomedical data integration","volume":"25","author":"Smith","year":"2007","journal-title":"Nat. Biotechnol"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"826","DOI":"10.1021\/ci00027a006","article-title":"Neural network studies, 1. Comparison of overfitting and overtraining","volume":"35","author":"Tetko","year":"1995","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023041407562013000_","first-page":"D330","article-title":"The gene ontology resource: 20 years and still GOing strong","volume":"47","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023041407562013000_","first-page":"D506","article-title":"UniProt: a worldwide hub of protein knowledge","volume":"47","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"W469","DOI":"10.1093\/nar\/gkab398","article-title":"NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information","volume":"49","author":"Yao","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"2465","DOI":"10.1093\/bioinformatics\/bty130","article-title":"GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank","volume":"34","author":"You","year":"2018","journal-title":"Bioinformatics"},{"key":"2023041407562013000_","doi-asserted-by":"crossref","first-page":"i262","DOI":"10.1093\/bioinformatics\/btab270","article-title":"DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction","volume":"37","author":"You","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041407562013000_","first-page":"244","volume-title":"Genome Biol.","author":"Zhou","year":"2019"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_1\/i238\/49887022\/btac256.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_1\/i238\/49887022\/btac256.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,20]],"date-time":"2023-11-20T00:28:29Z","timestamp":1700440109000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/Supplement_1\/i238\/6617515"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,24]]},"references-count":30,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2022,6,24]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac256","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,7,1]]},"published":{"date-parts":[[2022,6,24]]}}}