{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T21:10:05Z","timestamp":1775077805993,"version":"3.50.1"},"reference-count":90,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T00:00:00Z","timestamp":1737676800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,2,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>In silico functional annotation of proteins is crucial to narrowing the sequencing-accelerated gap in our understanding of protein activities. Numerous function annotation methods exist, and their ranks have been growing, particularly so with the recent deep learning-based developments. However, it is unclear if these tools are truly predictive. As we are not aware of any methods that can identify new terms in functional ontologies, we ask if they can, at least, identify molecular functions of proteins that are non-homologous to or far-removed from known protein families.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we explore the potential and limitations of the existing methods in predicting the molecular functions of thousands of such proteins. Lacking the \u201cground truth\u201d functional annotations, we transformed the assessment of function prediction into evaluation of functional similarity of protein pairs that likely share function but are unlike any of the currently functionally annotated sequences. Notably, our approach transcends the limitations of functional annotation vocabularies, providing a means to assess different-ontology annotation methods. We find that most existing methods are limited to identifying functional similarity of homologous sequences and fail to predict the function of proteins lacking reference. Curiously, despite their seemingly unlimited by-homology scope, deep learning methods also have trouble capturing the functional signal encoded in protein sequence. We believe that our work will inspire the development of a new generation of methods that push boundaries and promote exploration and discovery in the molecular function domain.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The data underlying this article are available at https:\/\/doi.org\/10.6084\/m9.figshare.c.6737127.v3. The code used to compute siblings is available openly at https:\/\/bitbucket.org\/bromberglab\/siblings-detector\/.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf035","type":"journal-article","created":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T18:25:23Z","timestamp":1737743123000},"source":"Crossref","is-referenced-by-count":5,"title":["Functional profiling of the sequence stockpile: a protein pair-based assessment of <i>in silico<\/i> prediction tools"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0172-3716","authenticated-orcid":false,"given":"R","family":"Prabakaran","sequence":"first","affiliation":[{"name":"Department of Biology, Emory University , Atlanta, GA 30322,","place":["United States"]},{"name":"Department of Computer Science, Emory University , Atlanta, GA 30322,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8351-0844","authenticated-orcid":false,"given":"Yana","family":"Bromberg","sequence":"additional","affiliation":[{"name":"Department of Biology, Emory University , Atlanta, GA 30322,","place":["United States"]},{"name":"Department of Computer Science, Emory University , Atlanta, GA 30322,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,1,24]]},"reference":[{"key":"2025030422243781900_btaf035-B1","doi-asserted-by":"crossref","first-page":"e1000700","DOI":"10.1371\/journal.pcbi.1000700","article-title":"Quantitative comparison of catalytic mechanisms and overall reactions in convergently evolved enzymes: implications for classification of enzyme function","volume":"6","author":"Almonacid","year":"2010","journal-title":"PLoS Comput Biol"},{"key":"2025030422243781900_btaf035-B2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.cbpa.2021.04.005","article-title":"Machine learning in protein structure prediction","volume":"65","author":"AlQuraishi","year":"2021","journal-title":"Curr Opin Chem Biol"},{"key":"2025030422243781900_btaf035-B3","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2025030422243781900_btaf035-B4","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1016\/S0968-0004(98)01298-5","article-title":"Iterated profile searches with PSI-BLAST\u2014a tool for discovery in protein databases","volume":"23","author":"Altschul","year":"1998","journal-title":"Trends Biochem Sci"},{"key":"2025030422243781900_btaf035-B5","doi-asserted-by":"crossref","first-page":"2251","DOI":"10.1093\/bioinformatics\/btz859","article-title":"KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold","volume":"36","author":"Aramaki","year":"2020","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B6","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat Genet"},{"key":"2025030422243781900_btaf035-B7","author":"Bepler"},{"key":"2025030422243781900_btaf035-B8","doi-asserted-by":"crossref","first-page":"932","DOI":"10.1038\/s41587-021-01179-w","article-title":"Using deep learning to annotate the protein universe","volume":"40","author":"Bileschi","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2025030422243781900_btaf035-B9","doi-asserted-by":"crossref","first-page":"2102","DOI":"10.1093\/bioinformatics\/btac020","article-title":"ProteinBERT: a universal deep-learning model of protein sequence and function","volume":"38","author":"Brandes","year":"2022","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B10","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1038\/s41592-021-01101-x","article-title":"Sensitive protein alignments at tree-of-life scale using DIAMOND","volume":"18","author":"Buchfink","year":"2021","journal-title":"Nat Methods"},{"key":"2025030422243781900_btaf035-B11","doi-asserted-by":"crossref","first-page":"D262","DOI":"10.1093\/nar\/gkh021","article-title":"The Gene Ontology Annotation (GOA) database: sharing knowledge in Uniprot with Gene Ontology","volume":"32","author":"Camon","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B12","first-page":"481","article-title":"Aligning two sequences within a specified diagonal band","volume":"8","author":"Chao","year":"1992","journal-title":"Comput Appl Biosci"},{"key":"2025030422243781900_btaf035-B13","doi-asserted-by":"crossref","first-page":"1519","DOI":"10.1002\/humu.23875","article-title":"Assessment of predicted enzymatic activity of alpha-N-acetylglucosaminidase variants of unknown significance for CAGI 2016","volume":"40","author":"Clark","year":"2019","journal-title":"Hum Mutat"},{"key":"2025030422243781900_btaf035-B14","doi-asserted-by":"crossref","first-page":"i53","DOI":"10.1093\/bioinformatics\/btt228","article-title":"Information-theoretic evaluation of predicted ontological annotations","volume":"29","author":"Clark","year":"2013","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B15","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1186\/s12859-018-2368-y","article-title":"ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature","volume":"19","author":"Dalkiran","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2025030422243781900_btaf035-B16","doi-asserted-by":"crossref","first-page":"e113","DOI":"10.1002\/cpz1.113","article-title":"Learned embeddings from deep learning to visualize and predict protein sets","volume":"1","author":"Dallago","year":"2021","journal-title":"Curr Protoc"},{"key":"2025030422243781900_btaf035-B17","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1137\/040608635","article-title":"Graph clustering via a discrete uncoupling process","volume":"30","author":"Dongen","year":"2008","journal-title":"SIAM J Matrix Anal Appl"},{"key":"2025030422243781900_btaf035-B18","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/0968-0004(94)90167-8","article-title":"Convergent evolution: the need to be explicit","volume":"19","author":"Doolittle","year":"1994","journal-title":"Trends Biochem Sci"},{"key":"2025030422243781900_btaf035-B19","first-page":"205","article-title":"A new generation of homology search tools based on probabilistic inference","volume":"23","author":"Eddy","year":"2009","journal-title":"Genome Inform"},{"key":"2025030422243781900_btaf035-B20","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comput Biol"},{"key":"2025030422243781900_btaf035-B21","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: Toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2025030422243781900_btaf035-B22","doi-asserted-by":"crossref","first-page":"bbac232","DOI":"10.1093\/bib\/bbac232","article-title":"Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks","volume":"23","author":"Fenoy","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025030422243781900_btaf035-B23","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1038\/nrg3456","article-title":"Functional and evolutionary implications of gene orthology","volume":"14","author":"Gabaldon","year":"2013","journal-title":"Nat Rev Genet"},{"key":"2025030422243781900_btaf035-B24","doi-asserted-by":"crossref","first-page":"1063","DOI":"10.1093\/bib\/bbx117","article-title":"Microbial genome analysis: the COG approach","volume":"20","author":"Galperin","year":"2019","journal-title":"Brief Bioinform"},{"key":"2025030422243781900_btaf035-B25","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1101\/gr.8.8.779","article-title":"Analogous enzymes: independent inventions in enzyme evolution","volume":"8","author":"Galperin","year":"1998","journal-title":"Genome Res"},{"key":"2025030422243781900_btaf035-B26","doi-asserted-by":"crossref","first-page":"2446","DOI":"10.1093\/nar\/gkz030","article-title":"The y-ome defines the 35% of Escherichia coli genes that lack experimental evidence of function","volume":"47","author":"Ghatak","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B27","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1093\/bfgp\/eln030","article-title":"Structure-based function prediction: approaches and applications","volume":"7","author":"Gherardini","year":"2008","journal-title":"Brief Funct Genomic Proteomic"},{"key":"2025030422243781900_btaf035-B28","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1016\/j.jmb.2007.06.017","article-title":"Convergent evolution of enzyme active sites is not a rare phenomenon","volume":"372","author":"Gherardini","year":"2007","journal-title":"J Mol Biol"},{"key":"2025030422243781900_btaf035-B29","doi-asserted-by":"publisher","first-page":"3168","DOI":"10.1038\/s41467-021-23303-9","article-title":"Structure-based protein function prediction using graph convolutional networks","volume":"12","author":"Gligorijevi\u0107","year":"2021","journal-title":"Nat Commun"},{"key":"2025030422243781900_btaf035-B30","doi-asserted-by":"crossref","first-page":"1323","DOI":"10.1093\/bioinformatics\/btw006","article-title":"MMseqs software suite for fast and deep clustering and searching of large protein sequence sets","volume":"32","author":"Hauser","year":"2016","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B31","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1186\/s12859-019-3220-8","article-title":"Modeling aspects of the language of life through transfer-learning protein sequences","volume":"20","author":"Heinzinger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2025030422243781900_btaf035-B32","doi-asserted-by":"crossref","first-page":"2606","DOI":"10.1038\/s41467-022-30070-8","article-title":"Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter","volume":"13","author":"Hoarfrost","year":"2022","journal-title":"Nat Commun"},{"key":"2025030422243781900_btaf035-B33","doi-asserted-by":"crossref","first-page":"giz118","DOI":"10.1093\/gigascience\/giz118","article-title":"SwiftOrtho: a fast, memory-efficient, multiple genome orthology classifier","volume":"8","author":"Hu","year":"2019","journal-title":"Gigascience"},{"key":"2025030422243781900_btaf035-B34","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-024-02523-z","article-title":"The nucleotide transformer: building and evaluating robust foundation models for human genomics","volume-title":"Nat Methods","author":"Hugo"},{"key":"2025030422243781900_btaf035-B35","doi-asserted-by":"crossref","first-page":"D1057","DOI":"10.1093\/nar\/gku1113","article-title":"The GOA database: gene ontology annotation updates for 2015","volume":"43","author":"Huntley","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B36","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1186\/s13059-016-1037-6","article-title":"An expanded evaluation of protein function prediction methods shows an improvement in accuracy","volume":"17","author":"Jiang","year":"2016","journal-title":"Genome Biol"},{"key":"2025030422243781900_btaf035-B37","doi-asserted-by":"crossref","first-page":"1236","DOI":"10.1093\/bioinformatics\/btu031","article-title":"InterProScan 5: genome-scale protein function classification","volume":"30","author":"Jones","year":"2014","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B38","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2025030422243781900_btaf035-B39","doi-asserted-by":"crossref","first-page":"1709","DOI":"10.3390\/biom12111709","article-title":"GOProFormer: a Multi-Modal transformer method for gene ontology protein function prediction","volume":"12","author":"Kabir","year":"2022","journal-title":"Biomolecules"},{"key":"2025030422243781900_btaf035-B40","doi-asserted-by":"crossref","first-page":"D587","DOI":"10.1093\/nar\/gkac963","article-title":"KEGG for taxonomy-based analysis of pathways and genomes","volume":"51","author":"Kanehisa","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B41","doi-asserted-by":"crossref","first-page":"D457","DOI":"10.1093\/nar\/gkv1070","article-title":"KEGG as a reference resource for gene and protein annotation","volume":"44","author":"Kanehisa","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B42","doi-asserted-by":"crossref","first-page":"726","DOI":"10.1016\/j.jmb.2015.11.006","article-title":"BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences","volume":"428","author":"Kanehisa","year":"2016","journal-title":"J Mol Biol"},{"key":"2025030422243781900_btaf035-B43","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1016\/j.sbi.2006.04.007","article-title":"Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction","volume":"16","author":"Kolodny","year":"2006","journal-title":"Curr Opin Struct Biol"},{"key":"2025030422243781900_btaf035-B44","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1093\/bioinformatics\/btz595","article-title":"DeepGOPlus: improved protein function prediction from sequence","volume":"36","author":"Kulmanov","year":"2020","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B45","doi-asserted-by":"crossref","first-page":"W140","DOI":"10.1093\/nar\/gkab373","article-title":"DeepGOWeb: fast and accurate protein function prediction on the (semantic) web","volume":"49","author":"Kulmanov","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B46","article-title":"Positive-unlabeled learning in bioinformatics and computational biology: a brief review","volume":"23","author":"Li","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025030422243781900_btaf035-B47","article-title":"Pretrained protein language model transfer learning: is the final layer representation what we want?","volume":"2022","author":"Li","year":"2022","journal-title":"Mach Learn Struct Biol Workshop NeurIPS"},{"key":"2025030422243781900_btaf035-B48","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B49","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025030422243781900_btaf035-B50","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1038\/s41598-020-80786-0","article-title":"Embeddings from deep learning transfer GO annotations beyond homology","volume":"11","author":"Littmann","year":"2021","journal-title":"Sci Rep"},{"key":"2025030422243781900_btaf035-B51","doi-asserted-by":"crossref","DOI":"10.1101\/2020.09.04.283929","article-title":"Self-Supervised contrastive learning of protein representations by mutual information maximization","author":"Lu","year":":\/\/.\/10.1101\/2020.09.04.283929, 2020, :   ."},{"key":"2025030422243781900_btaf035-B52","doi-asserted-by":"crossref","first-page":"i304","DOI":"10.1093\/bioinformatics\/bty262","article-title":"HFSP: high speed homology-driven function annotation of proteins","volume":"34","author":"Mahlich","year":"2018","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B53","doi-asserted-by":"crossref","first-page":"10162","DOI":"10.1093\/nar\/gkad757","article-title":"Learning from the unknown: exploring the range of bacterial functionality","volume":"51","author":"Mahlich","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B54","doi-asserted-by":"crossref","first-page":"1264","DOI":"10.3390\/genes11111264","article-title":"Automatic gene function prediction in the 2020's","volume":"11","author":"Makrodimitris","year":"2020","journal-title":"Genes (Basel)"},{"key":"2025030422243781900_btaf035-B55","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1186\/s13059-019-1649-8","article-title":"Improving the usability and archival stability of bioinformatics software","volume":"20","author":"Mangul","year":"2019","journal-title":"Genome Biol"},{"key":"2025030422243781900_btaf035-B56","doi-asserted-by":"crossref","first-page":"2722","DOI":"10.1093\/bioinformatics\/btt473","article-title":"lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests","volume":"29","author":"Mariani","year":"2013","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B57","doi-asserted-by":"crossref","first-page":"2214","DOI":"10.1111\/febs.16274","article-title":"Enzyme nomenclature and classification: the state of the art","volume":"290","author":"McDonald","year":"2023","journal-title":"FEBS J"},{"key":"2025030422243781900_btaf035-B58","doi-asserted-by":"crossref","first-page":"1050","DOI":"10.1002\/bies.201300066","article-title":"What is the total number of protein molecules per cell volume? A call to rethink some published values","volume":"35","author":"Milo","year":"2013","journal-title":"Bioessays"},{"key":"2025030422243781900_btaf035-B59","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B60","first-page":"D570","article-title":"MGnify: the microbiome analysis resource in 2020","volume":"48","author":"Mitchell","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B61","doi-asserted-by":"crossref","first-page":"e1002073","DOI":"10.1371\/journal.pcbi.1002073","article-title":"Testing the ortholog conjecture with comparative functional genomic data from mammals","volume":"7","author":"Nehrt","year":"2011","journal-title":"PLoS Comput Biol"},{"key":"2025030422243781900_btaf035-B62","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1186\/1745-6150-5-31","article-title":"Non-homologous isofunctional enzymes: a systematic analysis of alternative solutions in enzyme evolution","volume":"5","author":"Omelchenko","year":"2010","journal-title":"Biol Direct"},{"key":"2025030422243781900_btaf035-B63","doi-asserted-by":"crossref","first-page":"W200","DOI":"10.1093\/nar\/gky448","article-title":"HMMER web server: 2018 update","volume":"46","author":"Potter","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B64","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1515\/hsz-2021-0125","article-title":"Unification of functional annotation descriptions using text mining","volume":"402","author":"Queir\u00f3s","year":"2021","journal-title":"Biol Chem"},{"key":"2025030422243781900_btaf035-B65","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat Methods"},{"key":"2025030422243781900_btaf035-B66","doi-asserted-by":"crossref","first-page":"vbac057","DOI":"10.1093\/bioadv\/vbac057","article-title":"The field of protein function prediction as viewed by different domain scientists","volume":"2","author":"Ramola","year":"2022","journal-title":"Bioinform Adv"},{"key":"2025030422243781900_btaf035-B67","first-page":"124","article-title":"Estimating classification accuracy in positive-unlabeled learning: characterization and correction strategies","volume":"24","author":"Ramola","year":"2019","journal-title":"Pac Symp Biocomput"},{"key":"2025030422243781900_btaf035-B68","doi-asserted-by":"crossref","first-page":"D461","DOI":"10.1093\/nar\/gkaa1004","article-title":"The transporter classification database (TCDB): 2021 update","volume":"49","author":"Saier","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B69","doi-asserted-by":"crossref","first-page":"2732","DOI":"10.1021\/bi002272k","article-title":"Structural studies of duck Delta 1 and Delta 2 crystallin suggest conformational changes occur during catalysis","volume":"40","author":"Sampaleanu","year":"2001","journal-title":"Biochemistry"},{"key":"2025030422243781900_btaf035-B70","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.80942","article-title":"ProteInfer, deep neural networks for protein functional inference","volume":"12","author":"Sanderson","year":"2023","journal-title":"Elife"},{"key":"2025030422243781900_btaf035-B71","doi-asserted-by":"crossref","first-page":"D141","DOI":"10.1093\/nar\/gkac1012","article-title":"GenBank 2023 update","volume":"51","author":"Sayers","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B72","doi-asserted-by":"crossref","first-page":"e1003063","DOI":"10.1371\/journal.pcbi.1003063","article-title":"Biases in the experimental annotations of protein function and their effect on our understanding of protein function space","volume":"9","author":"Schnoes","year":"2013","journal-title":"PLoS Comput Biol"},{"key":"2025030422243781900_btaf035-B73","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","article-title":"UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches","volume":"31","author":"Suzek","year":"2015","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B74","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1146\/annurev.genet.38.072902.092831","article-title":"Duplication and divergence: the evolution of new genes and old ideas","volume":"38","author":"Taylor","year":"2004","journal-title":"Annu Rev Genet"},{"key":"2025030422243781900_btaf035-B75","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.1432-1033.1994.tb18960.x","article-title":"Nomenclature committee of the international union of biochemistry and molecular biology (NC-IUBMB). enzyme nomenclature. Recommendations 1992. Supplement: corrections and additions","volume":"223","author":"Tipton","year":"1994","journal-title":"Eur J Biochem"},{"key":"2025030422243781900_btaf035-B76","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1006\/jmbi.2001.4513","article-title":"Evolution of function in protein superfamilies, from a structural perspective","volume":"307","author":"Todd","year":"2001","journal-title":"J Mol Biol"},{"key":"2025030422243781900_btaf035-B77","doi-asserted-by":"crossref","first-page":"1435","DOI":"10.1016\/S0969-2126(02)00861-4","article-title":"Sequence and structural differences between enzyme and nonenzyme homologs","volume":"10","author":"Todd","year":"2002","journal-title":"Structure"},{"key":"2025030422243781900_btaf035-B78","doi-asserted-by":"crossref","first-page":"D523","DOI":"10.1093\/nar\/gkac1052","article-title":"UniProt: the universal protein knowledgebase in 2023","volume":"51","author":"UniProt","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B79","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1038\/s41587-023-01773-0","article-title":"Fast and accurate protein structure search with foldseek","volume":"42","author":"Van Kempen","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2025030422243781900_btaf035-B80","doi-asserted-by":"crossref","first-page":"e67667","DOI":"10.7554\/eLife.67667","article-title":"Unifying the known and unknown microbial coding sequence space","volume":"11","author":"Vanni","year":"2022","journal-title":"Elife"},{"key":"2025030422243781900_btaf035-B81","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv Neur In"},{"key":"2025030422243781900_btaf035-B82","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"SciPy 1.0: fundamental algorithms for scientific computing in Python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat Methods"},{"key":"2025030422243781900_btaf035-B83","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1093\/bioinformatics\/btm087","article-title":"A new method to measure the semantic similarity of GO terms","volume":"23","author":"Wang","year":"2007","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B84","doi-asserted-by":"crossref","first-page":"3497","DOI":"10.1074\/mcp.M113.037309","article-title":"A \u201cproteomic ruler\u201d for protein copy number and concentration estimation without spike-in standards","volume":"13","author":"Wi\u015bniewski","year":"2014","journal-title":"Mol Cell Proteomics"},{"key":"2025030422243781900_btaf035-B85","doi-asserted-by":"crossref","first-page":"889","DOI":"10.1093\/bioinformatics\/btq066","article-title":"How significant is a protein structure similarity with TM-score = 0.5?","volume":"26","author":"Xu","year":"2010","journal-title":"Bioinformatics"},{"key":"2025030422243781900_btaf035-B86","doi-asserted-by":"crossref","first-page":"W469","DOI":"10.1093\/nar\/gkab398","article-title":"NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information","volume":"49","author":"Yao","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025030422243781900_btaf035-B87","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1002\/prot.20264","article-title":"Scoring function for automated assessment of protein structure template quality","volume":"57","author":"Zhang","year":"2004","journal-title":"Proteins"},{"key":"2025030422243781900_btaf035-B88","doi-asserted-by":"crossref","first-page":"15107","DOI":"10.1038\/s41598-018-33219-y","article-title":"GOGO: an improved algorithm to measure the semantic similarity between gene ontology terms","volume":"8","author":"Zhao","year":"2018","journal-title":"Sci Rep"},{"key":"2025030422243781900_btaf035-B89","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1186\/s13059-019-1835-8","article-title":"The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens","volume":"20","author":"Zhou","year":"2019","journal-title":"Genome Biol"},{"key":"2025030422243781900_btaf035-B90","doi-asserted-by":"crossref","first-page":"e23","DOI":"10.1093\/nar\/gkx1209","article-title":"Functional sequencing read annotation for high precision microbiome analysis","volume":"46","author":"Zhu","year":"2018","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf035\/61621732\/btaf035.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/2\/btaf035\/61621732\/btaf035.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/2\/btaf035\/61621732\/btaf035.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T22:25:07Z","timestamp":1741127107000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf035\/7978914"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,1,24]]},"references-count":90,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf035","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,2]]},"published":{"date-parts":[[2025,1,24]]},"article-number":"btaf035"}}