{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T04:15:40Z","timestamp":1749096940900,"version":"3.41.0"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"23","funder":[{"name":"the European Commission via projects MAESTRA","award":["ICT-2013-612944"],"award-info":[{"award-number":["ICT-2013-612944"]}]},{"name":"InnoMol","award":["316289"],"award-info":[{"award-number":["316289"]}]},{"name":"MULTIPLEX","award":["317532"],"award-info":[{"award-number":["317532"]}]},{"DOI":"10.13039\/501100004488","name":"the Croatian Science Foundation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004488","id-type":"DOI","asserted-by":"crossref"}]},{"name":"DescriptiveInduction","award":["HRZZ-9623"],"award-info":[{"award-number":["HRZZ-9623"]}]},{"name":"Multicast","award":["HRZZ-5660"],"award-info":[{"award-number":["HRZZ-5660"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,12,1]]},"abstract":"<jats:p>Motivation: The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions.<\/jats:p><jats:p>Results: Our pipeline amalgamates 5\u00a0133\u00a0543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse genomic AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most-confident prediction per gene\/function, rather than enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits\/gene of known Escherichia coli gene functions. This can be increased by up to 5.5 bits\/gene using individual AFP methods or by 11 additional bits\/gene upon integration, thereby providing a highly-ranking predictor on the Critical Assessment of Function Annotation 2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them.<\/jats:p><jats:p>Availability and Implementation: The individual and integrated GO predictions for the complete set of genes are available from http:\/\/gorbi.irb.hr\/.<\/jats:p><jats:p>Contact: \u00a0fran.supek@irb.hr<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary materials are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw532","type":"journal-article","created":{"date-parts":[[2016,8,14]],"date-time":"2016-08-14T00:17:57Z","timestamp":1471133877000},"page":"3645-3653","source":"Crossref","is-referenced-by-count":12,"title":["Extensive complementarity between gene function prediction methods"],"prefix":"10.1093","volume":"32","author":[{"given":"Vedrana","family":"Vidulin","sequence":"first","affiliation":[{"name":"1Division of Electronics, Ru\u0111er Bo\u0161kovi\u0107 Institute, Bijeni\u010dka cesta 54, Zagreb 10000, Croatia"}]},{"given":"Tomislav","family":"\u0160muc","sequence":"additional","affiliation":[{"name":"1Division of Electronics, Ru\u0111er Bo\u0161kovi\u0107 Institute, Bijeni\u010dka cesta 54, Zagreb 10000, Croatia"}]},{"given":"Fran","family":"Supek","sequence":"additional","affiliation":[{"name":"1Division of Electronics, Ru\u0111er Bo\u0161kovi\u0107 Institute, Bijeni\u010dka cesta 54, Zagreb 10000, Croatia"},{"name":"2EMBL\/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology and UPF, Dr. Aiguader 88, Barcelona 08003, Spain"}]}],"member":"286","published-online":{"date-parts":[[2016,8,13]]},"reference":[{"key":"2023020114073582400_btw532-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023020114073582400_btw532-B2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene Ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nature Genet"},{"year":"2006","author":"Blockeel","key":"2023020114073582400_btw532-B3"},{"key":"2023020114073582400_btw532-B4","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Machine Learning"},{"key":"2023020114073582400_btw532-B5","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1016\/j.mib.2013.01.008","article-title":"High-throughput approaches to understanding gene function and mapping network architecture in bacteria","volume":"16","author":"Brochado","year":"2013","journal-title":"Curr. Opin. Microbiol"},{"key":"2023020114073582400_btw532-B6","doi-asserted-by":"crossref","first-page":"S17.","DOI":"10.1186\/1471-2105-6-S1-S17","article-title":"An evaluation of GO annotation retrieval for BioCreAtIvE and GOA","volume":"6","author":"Camon","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020114073582400_btw532-B7","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.ymeth.2015.09.011","article-title":"Integrated protein function prediction by mining function associations, sequences, and protein\u2013protein and gene-gene interaction networks","volume":"93","author":"Cao","year":"2016","journal-title":"Methods"},{"key":"2023020114073582400_btw532-B8","doi-asserted-by":"crossref","first-page":"i53","DOI":"10.1093\/bioinformatics\/btt228","article-title":"Information-theoretic evaluation of predicted ontological annotations","volume":"29","author":"Clark","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B9","doi-asserted-by":"crossref","first-page":"S1.","DOI":"10.1186\/1471-2105-14-S3-S1","article-title":"Protein function prediction by massive integration of evolutionary analyses and multiple data sources","volume":"14(Suppl 3)","author":"Cozzetto","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020114073582400_btw532-B10","doi-asserted-by":"crossref","first-page":"e48728.","DOI":"10.1371\/journal.pone.0048728","article-title":"Efficient prediction of co-complexed proteins based on coevolution","volume":"7","author":"de Vienne","year":"2012","journal-title":"PloS One"},{"key":"2023020114073582400_btw532-B11","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1016\/j.tig.2013.09.005","article-title":"CAFA and the open world of protein function predictions","volume":"29","author":"Dessimoz","year":"2013","journal-title":"Trends Genet"},{"key":"2023020114073582400_btw532-B12","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1007\/s10044-013-0336-8","article-title":"Performance evaluation of early and late fusion methods for generic semantics indexing","volume":"17","author":"Dong","year":"2014","journal-title":"Pattern Anal. Appl"},{"key":"2023020114073582400_btw532-B14","doi-asserted-by":"crossref","first-page":"9033","DOI":"10.1073\/pnas.0402591101","article-title":"Coevolution of gene expression among interacting proteins","volume":"101","author":"Fraser","year":"2004","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023020114073582400_btw532-B15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-14-S3-S7","article-title":"Homology-based inference sets the bar high for protein function prediction","volume":"14","author":"Hamp","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020114073582400_btw532-B16","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1002\/prot.22172","article-title":"PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data","volume":"74","author":"Hawkins","year":"2009","journal-title":"Proteins"},{"key":"2023020114073582400_btw532-B17","doi-asserted-by":"crossref","first-page":"929.","DOI":"10.1371\/journal.pbio.1000096","article-title":"Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins","volume":"7","author":"Hu","year":"2009","journal-title":"PLoS Biol"},{"key":"2023020114073582400_btw532-B18","doi-asserted-by":"crossref","first-page":"D306","DOI":"10.1093\/nar\/gkr948","article-title":"InterPro in 2011: new developments in the family and domain prediction database","volume":"40","author":"Hunter","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020114073582400_btw532-B19","doi-asserted-by":"crossref","first-page":"635","DOI":"10.1093\/bioinformatics\/btg036","article-title":"Prediction of human protein function according to Gene Ontology categories","volume":"19","author":"Jensen","year":"2003","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B47","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-016-1037-6","article-title":"An expanded evaluation of protein function prediction methods shows an improvement in accuracy","author":"Jiang","year":"2016","journal-title":"Genome Biol, 2016"},{"key":"2023020114073582400_btw532-B20","doi-asserted-by":"crossref","first-page":"1236","DOI":"10.1093\/bioinformatics\/btu031","article-title":"InterProScan 5: genome-scale protein function classification","volume":"30","author":"Jones","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B21","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1098\/rsif.2007.1047","article-title":"Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution","volume":"5","author":"Kensche","year":"2008","journal-title":"J. R. Soc. Interface"},{"key":"2023020114073582400_btw532-B22","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1093\/bioinformatics\/17.5.445","article-title":"The utility of different representations of protein sequence for predicting functional class","volume":"17","author":"King","year":"2001","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B23","doi-asserted-by":"crossref","first-page":"R44.","DOI":"10.1186\/gb-2014-15-3-r44","article-title":"Inferring gene function from evolutionary change in signatures of translation efficiency","volume":"15","author":"Kri\u0161ko","year":"2014","journal-title":"Genome Biol"},{"key":"2023020114073582400_btw532-B24","doi-asserted-by":"crossref","first-page":"2626","DOI":"10.1093\/bioinformatics\/bth294","article-title":"A statistical framework for genomic data fusion","volume":"20","author":"Lanckriet","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B25","doi-asserted-by":"crossref","first-page":"1555","DOI":"10.1126\/science.1099511","article-title":"A probabilistic functional network of yeast genes","volume":"306","author":"Lee","year":"2004","journal-title":"Science"},{"key":"2023020114073582400_btw532-B26","doi-asserted-by":"crossref","first-page":"1143","DOI":"10.1101\/gr.102749.109","article-title":"Predicting genetic modifier loci using functional gene networks","volume":"20","author":"Lee","year":"2010","journal-title":"Genome Res"},{"key":"2023020114073582400_btw532-B27","doi-asserted-by":"crossref","first-page":"253.","DOI":"10.1186\/1471-2105-13-253","article-title":"G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes","volume":"13","author":"Lemay","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020114073582400_btw532-B28","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1093\/bioinformatics\/btp027","article-title":"Detecting gene clusters under evolutionary constraint in a large number of genomes","volume":"25","author":"Ling","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B29","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1007\/s10994-013-5377-0","article-title":"On using nearly-independent feature families for high precision and confidence","volume":"92","author":"Madani","year":"2013","journal-title":"Machine Learning"},{"key":"2023020114073582400_btw532-B30","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1038\/ng1967","article-title":"Differential translation efficiency of orthologous genes is involved in phenotypic divergence of yeast species","volume":"39","author":"Man","year":"2007","journal-title":"Nature Genet"},{"key":"2023020114073582400_btw532-B31","doi-asserted-by":"crossref","first-page":"e63754.","DOI":"10.1371\/journal.pone.0063754","article-title":"FFPred 2.0: improved homology-independent prediction of gene ontology terms for eukaryotic protein sequences","volume":"8","author":"Minneci","year":"2013","journal-title":"PLoS ONE"},{"key":"2023020114073582400_btw532-B32","doi-asserted-by":"crossref","first-page":"1759","DOI":"10.1093\/bioinformatics\/btq262","article-title":"Fast integration of heterogeneous data sources for predicting gene function with limited annotation","volume":"26","author":"Mostafavi","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B33","doi-asserted-by":"crossref","first-page":"2322","DOI":"10.1093\/bioinformatics\/btm332","article-title":"Context-sensitive data integration and prediction of biological networks","volume":"23","author":"Myers","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B34","first-page":"btv345.","article-title":"ProFET: Feature engineering captures high-level protein functions","author":"Ofer","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B35","doi-asserted-by":"crossref","first-page":"4285","DOI":"10.1073\/pnas.96.8.4285","article-title":"Assigning protein functions by comparative genome analysis: protein phylogenetic profiles","volume":"96","author":"Pellegrini","year":"1999","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020114073582400_btw532-B36","doi-asserted-by":"crossref","first-page":"D231","DOI":"10.1093\/nar\/gkt1253","article-title":"eggNOG v4.0: nested orthology inference across 3686 organisms","volume":"42","author":"Powell","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020114073582400_btw532-B37","doi-asserted-by":"crossref","first-page":"D290","DOI":"10.1093\/nar\/gkr1065","article-title":"The Pfam protein families database","volume":"40","author":"Punta","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020114073582400_btw532-B38","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat. Methods"},{"key":"2023020114073582400_btw532-B39","doi-asserted-by":"crossref","first-page":"2212","DOI":"10.1093\/nar\/30.10.2212","article-title":"Connected gene neighborhoods in prokaryotic genomes","volume":"30","author":"Rogozin","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023020114073582400_btw532-B40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-11-2","article-title":"Predicting gene function using hierarchical multi-label decision tree ensembles","volume":"11","author":"Schietgat","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020114073582400_btw532-B41","doi-asserted-by":"crossref","first-page":"729","DOI":"10.1039\/b913690h","article-title":"Finding explained groups of time-course gene expression profiles with predictive clustering trees","volume":"6","author":"Slavkov","year":"2010","journal-title":"Mol. BioSyst"},{"key":"2023020114073582400_btw532-B42","doi-asserted-by":"crossref","first-page":"399","DOI":"10.1145\/1101149.1101236","volume-title":"Proceedings of the 13th annual ACM international conference on Multimedia (MULTIMEDIA \u201905)","author":"Snoek","year":"2005"},{"key":"2023020114073582400_btw532-B43","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1142\/S0219720010004744","article-title":"Hierarchical classification of Gene Ontology terms using the GOstruct method","volume":"8","author":"Sokolov","year":"2010","journal-title":"J. Bioinform. Comput. Biol"},{"key":"2023020114073582400_btw532-B44","doi-asserted-by":"crossref","first-page":"e1002533","DOI":"10.1371\/journal.pcbi.1002533","article-title":"Quality of computationally inferred gene ontology annotations","volume":"8","author":"\u0160kunca","year":"2012","journal-title":"PLoS Comput. Biol"},{"key":"2023020114073582400_btw532-B45","doi-asserted-by":"crossref","first-page":"e1002852","DOI":"10.1371\/journal.pcbi.1002852","article-title":"Phyletic profiling with cliques of orthologs is enhanced by signatures of paralogy relationships","volume":"9","author":"\u0160kunca","year":"2013","journal-title":"PLoS Comput. Biol"},{"key":"2023020114073582400_btw532-B46","doi-asserted-by":"crossref","first-page":"e1001004.","DOI":"10.1371\/journal.pgen.1001004","article-title":"Translational selection is ubiquitous in prokaryotes","volume":"6","author":"Supek","year":"2010","journal-title":"PLoS Genet"},{"key":"2023020114073582400_btw532-B48","doi-asserted-by":"crossref","first-page":"S7.","DOI":"10.1186\/gb-2008-9-s1-s7","article-title":"Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function","volume":"9(Suppl 1)","author":"Tian","year":"2008","journal-title":"Genome Biol"},{"key":"2023020114073582400_btw532-B49","doi-asserted-by":"crossref","first-page":"8348","DOI":"10.1073\/pnas.0832373100","article-title":"A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)","volume":"100","author":"Troyanskaya","year":"2003","journal-title":"Proc. Natl. Acad. Sci. USA"},{"article-title":"Support vector classifier with asymmetric kernel functions","year":"1999","author":"Tsuda","key":"2023020114073582400_btw532-B50"},{"key":"2023020114073582400_btw532-B51","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1007\/s10994-008-5077-3","article-title":"Decision trees for hierarchical multi-label classification","volume":"73","author":"Vens","year":"2008","journal-title":"Machine Learning"},{"key":"2023020114073582400_btw532-B52","first-page":"D433","article-title":"STRING: known and predicted protein\u2013protein associations, integrated and transferred across organisms","volume":"33(suppl 1)","author":"Von Mering","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023020114073582400_btw532-B53","doi-asserted-by":"crossref","first-page":"798","DOI":"10.1093\/bioinformatics\/btn037","article-title":"ConFunc\u2014functional annotation in the twilight zone","volume":"24","author":"Wass","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020114073582400_btw532-B54","doi-asserted-by":"crossref","first-page":"W466","DOI":"10.1093\/nar\/gks489","article-title":"CombFunc: predicting protein function using heterogeneous data sources","volume":"40","author":"Wass","year":"2012","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/23\/3645\/49027091\/bioinformatics_32_23_3645.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/23\/3645\/49027091\/bioinformatics_32_23_3645.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,4]],"date-time":"2025-06-04T18:02:00Z","timestamp":1749060120000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/23\/3645\/2525646"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,8,13]]},"references-count":53,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2016,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw532","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2016,12,1]]},"published":{"date-parts":[[2016,8,13]]}}}