{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T14:09:04Z","timestamp":1774879744865,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2024,7,23]],"date-time":"2024-07-23T00:00:00Z","timestamp":1721692800000},"content-version":"vor","delay-in-days":61,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["AI134678"],"award-info":[{"award-number":["AI134678"]}],"id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2138259"],"award-info":[{"award-number":["2138259"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2138286"],"award-info":[{"award-number":["2138286"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2138307"],"award-info":[{"award-number":["2138307"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2137603"],"award-info":[{"award-number":["2137603"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2138296"],"award-info":[{"award-number":["2138296"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,5,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These searches are also a critical component in most state-of-the-art machine learning and deep learning-based protein function predictors. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction. When predicting GO terms on a large benchmark dataset, we found that BLASTp and MMseqs2 consistently exceed the performance of other tools, including DIAMOND\u2014one of the most popular tools for function prediction\u2014under default search parameters. However, with the correct parameter settings, DIAMOND can perform comparably to BLASTp and MMseqs2 in function prediction. Additionally, we developed a new scoring function to derive GO prediction from homologous hits that consistently outperform previously proposed scoring functions. These findings enable the improvement of almost all protein function prediction algorithms with a few easily implementable changes in their sequence homolog-based component. This study emphasizes the critical role of search parameter settings in homology-based function transfer and should have an important contribution to the development of future protein function prediction algorithms.<\/jats:p>","DOI":"10.1093\/bib\/bbae349","type":"journal-article","created":{"date-parts":[[2024,7,5]],"date-time":"2024-07-05T00:19:48Z","timestamp":1720138788000},"source":"Crossref","is-referenced-by-count":11,"title":["A large-scale assessment of sequence database search tools for homology-based protein function prediction"],"prefix":"10.1093","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7290-1324","authenticated-orcid":false,"given":"Chengxin","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Computational Medicine and Bioinformatics , Department of Biological Chemistry, , 100 Washtenaw Avenue, Ann Arbor, MI 48109 , United States"},{"name":"University of Michigan , Department of Biological Chemistry, , 100 Washtenaw Avenue, Ann Arbor, MI 48109 , United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5821-4226","authenticated-orcid":false,"given":"Lydia","family":"Freddolino","sequence":"additional","affiliation":[{"name":"Department of Computational Medicine and Bioinformatics , Department of Biological Chemistry, , 100 Washtenaw Avenue, Ann Arbor, MI 48109 , United States"},{"name":"University of Michigan , Department of Biological Chemistry, , 100 Washtenaw Avenue, Ann Arbor, MI 48109 , United States"}]}],"member":"286","published-online":{"date-parts":[[2024,7,22]]},"reference":[{"key":"2024072300043417000_ref1","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1186\/1471-2105-5-178","article-title":"GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes","volume":"5","author":"Martin","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2024072300043417000_ref2","doi-asserted-by":"crossref","first-page":"619832","DOI":"10.1155\/2008\/619832","article-title":"Blast2GO: a comprehensive suite for functional analysis in plant genomics","volume":"2008","author":"Conesa","year":"2008","journal-title":"Int J Plant Genomics"},{"key":"2024072300043417000_ref3","doi-asserted-by":"crossref","first-page":"798","DOI":"10.1093\/bioinformatics\/btn037","article-title":"ConFunc\u2014functional annotation in the twilight zone","volume":"24","author":"Wass","year":"2008","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref4","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1002\/prot.22172","article-title":"PFP: automated prediction of gene ontology functional annotations with confidence scores using protein sequence data","volume":"74","author":"Hawkins","year":"2009","journal-title":"Proteins"},{"key":"2024072300043417000_ref5","doi-asserted-by":"crossref","first-page":"W197","DOI":"10.1093\/nar\/gkr292","article-title":"BAR-PLUS: the bologna annotation resource plus for functional and structural annotation of protein sequences","volume":"39","author":"Piovesan","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2024072300043417000_ref6","doi-asserted-by":"crossref","first-page":"W466","DOI":"10.1093\/nar\/gks489","article-title":"CombFunc: predicting protein function using heterogeneous data sources","volume":"40","author":"Wass","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2024072300043417000_ref7","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.ymeth.2015.08.009","article-title":"GoFDR: a sequence alignment based method for predicting protein functions","volume":"93","author":"Gong","year":"2016","journal-title":"Methods"},{"key":"2024072300043417000_ref8","doi-asserted-by":"crossref","first-page":"W291","DOI":"10.1093\/nar\/gkx366","article-title":"COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information","volume":"45","author":"Zhang","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2024072300043417000_ref9","doi-asserted-by":"crossref","first-page":"2256","DOI":"10.1016\/j.jmb.2018.03.004","article-title":"MetaGO: predicting gene ontology of non-homologous proteins through low-resolution protein structure prediction and protein protein network mapping","volume":"430","author":"Zhang","year":"2018","journal-title":"J Mol Biol"},{"key":"2024072300043417000_ref10","doi-asserted-by":"crossref","first-page":"i304","DOI":"10.1093\/bioinformatics\/bty262","article-title":"HFSP: high speed homology-driven function annotation of proteins","volume":"34","author":"Mahlich","year":"2018","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref11","doi-asserted-by":"crossref","first-page":"1187","DOI":"10.1093\/bioinformatics\/btaa763","article-title":"DeepGOPlus: improved protein function prediction from sequence","volume":"37","author":"Kulmanov","year":"2020","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref12","doi-asserted-by":"crossref","first-page":"2825","DOI":"10.1093\/bioinformatics\/btab198","article-title":"TALE: transformer-based protein function annotation with joint sequence-label embedding","volume":"37","author":"Cao","year":"2021","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref13","doi-asserted-by":"crossref","first-page":"e1010793","DOI":"10.1371\/journal.pcbi.1010793","article-title":"Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction","volume":"18","author":"Zhu","year":"2022","journal-title":"PLoS Comput Biol"},{"key":"2024072300043417000_ref14","doi-asserted-by":"crossref","first-page":"i238","DOI":"10.1093\/bioinformatics\/btac256","article-title":"DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms","volume":"38","author":"Kulmanov","year":"2022","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref15","doi-asserted-by":"crossref","first-page":"12","DOI":"10.7554\/eLife.80942","article-title":"ProteInfer, deep neural networks for protein functional inference","volume":"12","author":"Sanderson","year":"2023","journal-title":"Elife"},{"key":"2024072300043417000_ref16","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1093\/bib\/bbad117","article-title":"Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion","volume":"24","author":"Yuan","year":"2023","journal-title":"Brief Bioinform"},{"key":"2024072300043417000_ref17","doi-asserted-by":"crossref","first-page":"2465","DOI":"10.1093\/bioinformatics\/bty130","article-title":"GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank","volume":"34","author":"You","year":"2018","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref18","doi-asserted-by":"crossref","first-page":"W379","DOI":"10.1093\/nar\/gkz388","article-title":"NetGO: improving large-scale protein function prediction with massive network information","volume":"47","author":"You","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2024072300043417000_ref19","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2024072300043417000_ref20","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1038\/s41592-021-01101-x","article-title":"Sensitive protein alignments at tree-of-life scale using DIAMOND","volume":"18","author":"Buchfink","year":"2021","journal-title":"Nat Methods"},{"key":"2024072300043417000_ref21","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1093\/bioinformatics\/btr595","article-title":"RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data","volume":"28","author":"Zhao","year":"2012","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref22","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2024072300043417000_ref23","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1007\/978-1-4939-7015-5_2","article-title":"GHOSTX: a fast sequence homology search tool for functional annotation of metagenomic data","volume":"1611","author":"Suzuki","year":"2017","journal-title":"Methods Mol Biol"},{"key":"2024072300043417000_ref24","doi-asserted-by":"crossref","first-page":"2105","DOI":"10.1093\/bioinformatics\/btz863","article-title":"DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins","volume":"36","author":"Zhang","year":"2020","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref25","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2024072300043417000_ref26","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1038\/s41592-023-02130-4","article-title":"Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data","volume":"21","author":"Zheng","year":"2024","journal-title":"Nat Methods"},{"key":"2024072300043417000_ref27","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref28","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1038\/nmeth.1818","article-title":"HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment","volume":"9","author":"Remmert","year":"2012","journal-title":"Nat Methods"},{"key":"2024072300043417000_ref29","doi-asserted-by":"crossref","first-page":"2775","DOI":"10.1038\/s41467-024-46808-5","article-title":"PLMSearch: protein language model powers accurate and fast sequence search for remote homology","volume":"15","author":"Liu","year":"2024","journal-title":"Nat Commun"},{"key":"2024072300043417000_ref30","first-page":"1145","article-title":"Leveraging protein language models for accurate multiple sequence alignments","volume":"33","author":"McWhite","year":"2023","journal-title":"Genome Res"},{"key":"2024072300043417000_ref31","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1186\/1471-2105-14-248","article-title":"kClust: fast and sensitive clustering of large protein sequence databases","volume":"14","author":"Hauser","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2024072300043417000_ref32","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol Syst Biol"},{"key":"2024072300043417000_ref33","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1186\/s13059-019-1835-8","article-title":"The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens","volume":"20","author":"Zhou","year":"2019","journal-title":"Genome Biol"},{"key":"2024072300043417000_ref34","doi-asserted-by":"crossref","first-page":"i53","DOI":"10.1093\/bioinformatics\/btt228","article-title":"Information-theoretic evaluation of predicted ontological annotations","volume":"29","author":"Clark","year":"2013","journal-title":"Bioinformatics"},{"key":"2024072300043417000_ref35","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1038\/s42256-024-00795-w","article-title":"Protein function prediction as approximate semantic entailment","volume":"6","author":"Kulmanov","year":"2024","journal-title":"Nat Mach Intell"},{"key":"2024072300043417000_ref36","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/s13059-024-03166-1","article-title":"AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding","volume":"25","author":"Zheng","year":"2024","journal-title":"Genome Biol"},{"key":"2024072300043417000_ref37","doi-asserted-by":"crossref","first-page":"217","DOI":"10.2174\/138920306777452312","article-title":"Advances in homology protein structure modeling","volume":"7","author":"Xiang","year":"2006","journal-title":"Curr Protein Pept Sci"},{"key":"2024072300043417000_ref38","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1006\/jmbi.2000.3550","article-title":"Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores","volume":"297","author":"Wilson","year":"2000","journal-title":"J Mol Biol"},{"key":"2024072300043417000_ref39","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1002\/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S","article-title":"Practical limits of function prediction","volume":"41","author":"Devos","year":"2000","journal-title":"Proteins"},{"key":"2024072300043417000_ref40","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1016\/j.jmb.2003.08.057","article-title":"How well is enzyme function conserved as a function of pairwise sequence identity?","volume":"333","author":"Tian","year":"2003","journal-title":"J Mol Biol"},{"key":"2024072300043417000_ref41","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2024072300043417000_ref42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13062-021-00291-w","article-title":"Conserved sequence motifs in human TMTC1, TMTC2, TMTC3, and TMTC4, new O-mannosyltransferases from the GT-C\/PMT clan, are rationalized as ligand binding sites","volume":"16","author":"Eisenhaber","year":"2021","journal-title":"Biol Direct"},{"key":"2024072300043417000_ref43","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1002\/prot.25779","article-title":"Prediction of interresidue contacts with DeepMetaPSICOV in CASP13","volume":"87","author":"Kandathil","year":"2019","journal-title":"Proteins"},{"key":"2024072300043417000_ref44","doi-asserted-by":"crossref","first-page":"105450","DOI":"10.1016\/j.jsbmb.2019.105450","article-title":"Species differences in ligand interaction and activation of estrogen receptors in fish and human","volume":"195","author":"Asnake","year":"2019","journal-title":"J Steroid Biochem Mol Biol"},{"key":"2024072300043417000_ref45","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.taap.2014.07.020","article-title":"Selectivity of natural, synthetic and environmental estrogens for zebrafish estrogen receptors","volume":"280","author":"Pinto","year":"2014","journal-title":"Toxicol Appl Pharmacol"},{"key":"2024072300043417000_ref46","doi-asserted-by":"crossref","first-page":"1881","DOI":"10.1095\/biolreprod66.6.1881","article-title":"Molecular characterization of three estrogen receptor forms in zebrafish: binding characteristics, transactivation properties, and tissue distributions","volume":"66","author":"Menuet","year":"2002","journal-title":"Biol Reprod"},{"key":"2024072300043417000_ref47","article-title":"StarFunc: fusing template-based and deep learning approaches for accurate protein function prediction","author":"Zhang","year":"2024","journal-title":"bioRxiv"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/4\/bbae349\/58609733\/bbae349.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/4\/bbae349\/58609733\/bbae349.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,22]],"date-time":"2024-07-22T20:05:20Z","timestamp":1721678720000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae349\/7717955"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,23]]},"references-count":47,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,5,23]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae349","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.11.14.567021","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,5,23]]},"article-number":"bbae349"}}