{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T05:31:01Z","timestamp":1768887061050,"version":"3.49.0"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"S3","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2013,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Accurate protein function annotation is a severe bottleneck when utilizing the deluge of high-throughput, next generation sequencing data. Keeping database annotations up-to-date has become a major scientific challenge that requires the development of reliable automatic predictors of protein function. The CAFA experiment provided a unique opportunity to undertake comprehensive 'blind testing' of many diverse approaches for automated function prediction. We report on the methodology we used for this challenge and on the lessons we learnt.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Methods<\/jats:title>\n            <jats:p>Our method integrates into a single framework a wide variety of biological information sources, encompassing sequence, gene expression and protein-protein interaction data, as well as annotations in UniProt entries. The methodology transfers functional categories based on the results from complementary homology-based and feature-based analyses. We generated the final molecular function and biological process assignments by combining the initial predictions in a probabilistic manner, which takes into account the Gene Ontology hierarchical structure.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We propose a novel scoring function called COmbined Graph-Information Content similarity (COGIC) score for the comparison of predicted functional categories and benchmark data. We demonstrate that our integrative approach provides increased scope and accuracy over both the component methods and the na\u00efve predictors. In line with previous studies, we find that molecular function predictions are more accurate than biological process assignments.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Overall, the results indicate that there is considerable room for improvement in the field. It still remains for the community to invest a great deal of effort to make automated function prediction a useful and routine component in the toolbox of life scientists. As already witnessed in other areas, community-wide blind testing experiments will be pivotal in establishing standards for the evaluation of prediction accuracy, in fostering advancements and new ideas, and ultimately in recording progress.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-14-s3-s1","type":"journal-article","created":{"date-parts":[[2013,3,1]],"date-time":"2013-03-01T01:44:09Z","timestamp":1362102249000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":72,"title":["Protein function prediction by massive integration of evolutionary analyses and multiple data sources"],"prefix":"10.1186","volume":"14","author":[{"given":"Domenico","family":"Cozzetto","sequence":"first","affiliation":[]},{"given":"Daniel WA","family":"Buchan","sequence":"additional","affiliation":[]},{"given":"Kevin","family":"Bryson","sequence":"additional","affiliation":[]},{"given":"David T","family":"Jones","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2013,2,28]]},"reference":[{"issue":"12","key":"5690_CR1","doi-asserted-by":"publisher","first-page":"e1000605","DOI":"10.1371\/journal.pcbi.1000605","volume":"5","author":"AM Schnoes","year":"2009","unstructured":"Schnoes AM, Brown SD, Dodevski I, Babbitt PC: Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies. PLoS Comput Biol. 2009, 5 (12): e1000605-10.1371\/journal.pcbi.1000605.","journal-title":"PLoS Comput Biol"},{"key":"5690_CR2","doi-asserted-by":"publisher","first-page":"170","DOI":"10.1186\/1471-2105-8-170","volume":"8","author":"CE Jones","year":"2007","unstructured":"Jones CE, Brown AL, Baumann U: Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics. 2007, 8: 170-10.1186\/1471-2105-8-170.","journal-title":"BMC Bioinformatics"},{"issue":"13","key":"5690_CR3","doi-asserted-by":"publisher","first-page":"i41","DOI":"10.1093\/bioinformatics\/btm229","volume":"23","author":"WA Baumgartner Jr","year":"2007","unstructured":"Baumgartner WA, Cohen KB, Fox LM, Acquaah-Mensah G, Hunter L: Manual curation is not sufficient for annotation of genomic databases. Bioinformatics. 2007, 23 (13): i41-48. 10.1093\/bioinformatics\/btm229.","journal-title":"Bioinformatics"},{"key":"5690_CR4","doi-asserted-by":"crossref","unstructured":"Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res. 2011, 39 (Database): D214-219.","DOI":"10.1093\/nar\/gkq1020"},{"issue":"7","key":"5690_CR5","doi-asserted-by":"publisher","first-page":"2086","DOI":"10.1002\/prot.23029","volume":"79","author":"WT Clark","year":"2011","unstructured":"Clark WT, Radivojac P: Analysis of protein function and its prediction from amino acid sequence. Proteins. 2011, 79 (7): 2086-2096. 10.1002\/prot.23029.","journal-title":"Proteins"},{"issue":"Database","key":"5690_CR6","doi-asserted-by":"publisher","first-page":"D306","DOI":"10.1093\/nar\/gkr948","volume":"40","author":"S Hunter","year":"2012","unstructured":"Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S: InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 2012, 40 (Database): D306-312.","journal-title":"Nucleic Acids Res"},{"issue":"Database","key":"5690_CR7","doi-asserted-by":"publisher","first-page":"D465","DOI":"10.1093\/nar\/gkr1181","volume":"40","author":"J Lees","year":"2012","unstructured":"Lees J, Yeats C, Perkins J, Sillitoe I, Rentzsch R, Dessailly BH, Orengo C: Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis. Nucleic Acids Res. 2012, 40 (Database): D465-471.","journal-title":"Nucleic Acids Res"},{"issue":"11","key":"5690_CR8","doi-asserted-by":"publisher","first-page":"1969","DOI":"10.1101\/gr.104687.109","volume":"21","author":"BE Engelhardt","year":"2011","unstructured":"Engelhardt BE, Jordan MI, Srouji JR, Brenner SE: Genome-scale phylogenetic function annotation of large and diverse protein families. Genome Res. 2011, 21 (11): 1969-1980. 10.1101\/gr.104687.109.","journal-title":"Genome Res"},{"key":"5690_CR9","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1186\/1471-2105-5-178","volume":"5","author":"DM Martin","year":"2004","unstructured":"Martin DM, Berriman M, Barton GJ: GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes. BMC Bioinformatics. 2004, 5: 178-10.1186\/1471-2105-5-178.","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"5690_CR10","doi-asserted-by":"publisher","first-page":"566","DOI":"10.1002\/prot.22172","volume":"74","author":"T Hawkins","year":"2009","unstructured":"Hawkins T, Chitale M, Luban S, Kihara D: PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins. 2009, 74 (3): 566-582. 10.1002\/prot.22172.","journal-title":"Proteins"},{"issue":"5","key":"5690_CR11","doi-asserted-by":"publisher","first-page":"635","DOI":"10.1093\/bioinformatics\/btg036","volume":"19","author":"LJ Jensen","year":"2003","unstructured":"Jensen LJ, Gupta R, Staerfeldt HH, Brunak S: Prediction of human protein function according to Gene Ontology categories. Bioinformatics. 2003, 19 (5): 635-642. 10.1093\/bioinformatics\/btg036.","journal-title":"Bioinformatics"},{"issue":"Web Server","key":"5690_CR12","doi-asserted-by":"publisher","first-page":"W297","DOI":"10.1093\/nar\/gkn193","volume":"36","author":"AE Lobley","year":"2008","unstructured":"Lobley AE, Nugent T, Orengo CA, Jones DT: FFPred: an integrated feature-based function prediction server for vertebrate proteomes. Nucleic Acids Res. 2008, 36 (Web Server): W297-302. 10.1093\/nar\/gkn193.","journal-title":"Nucleic Acids Res"},{"issue":"25","key":"5690_CR13","doi-asserted-by":"publisher","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","volume":"95","author":"MB Eisen","year":"1998","unstructured":"Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95 (25): 14863-14868. 10.1073\/pnas.95.25.14863.","journal-title":"Proc Natl Acad Sci USA"},{"issue":"1","key":"5690_CR14","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1073\/pnas.97.1.262","volume":"97","author":"MP Brown","year":"2000","unstructured":"Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci USA. 2000, 97 (1): 262-267. 10.1073\/pnas.97.1.262.","journal-title":"Proc Natl Acad Sci USA"},{"issue":"Suppl 3","key":"5690_CR15","doi-asserted-by":"publisher","first-page":"S10","DOI":"10.1186\/1471-2164-11-S3-S10","volume":"11","author":"J Wang","year":"2010","unstructured":"Wang J, Li M, Deng Y, Pan Y: Recent advances in clustering methods for protein interaction networks. BMC Genomics. 2010, 11 (Suppl 3): S10-10.1186\/1471-2164-11-S3-S10.","journal-title":"BMC Genomics"},{"key":"5690_CR16","doi-asserted-by":"publisher","first-page":"2005","DOI":"10.1038\/msb4100005","volume":"1","author":"A Tanay","year":"2005","unstructured":"Tanay A, Steinfeld I, Kupiec M, Shamir R: Integrative analysis of genome-wide experiments in the context of a large high-throughput data compendium. Mol Syst Biol. 2005, 1: 2005-0002","journal-title":"Mol Syst Biol"},{"issue":"Suppl 1","key":"5690_CR17","doi-asserted-by":"publisher","first-page":"S1","DOI":"10.1186\/gb-2008-9-s1-s1","volume":"9","author":"TR Hughes","year":"2008","unstructured":"Hughes TR, Roth FP: A race through the maze of genomic evidence. Genome Biol. 2008, 9 (Suppl 1): S1-10.1186\/gb-2008-9-s1-s1.","journal-title":"Genome Biol"},{"issue":"Suppl 1","key":"5690_CR18","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/gb-2008-9-s1-s2","volume":"9","author":"L Pena-Castillo","year":"2008","unstructured":"Pena-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK: A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 2008, 9 (Suppl 1): S2-10.1186\/gb-2008-9-s1-s2.","journal-title":"Genome Biol"},{"issue":"1","key":"5690_CR19","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"M Ashburner","year":"2000","unstructured":"Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038\/75556.","journal-title":"Nat Genet"},{"issue":"17","key":"5690_CR20","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093\/nar\/25.17.3389.","journal-title":"Nucleic Acids Res"},{"issue":"10","key":"5690_CR21","doi-asserted-by":"publisher","first-page":"1282","DOI":"10.1093\/bioinformatics\/btm098","volume":"23","author":"BE Suzek","year":"2007","unstructured":"Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH: UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007, 23 (10): 1282-1288. 10.1093\/bioinformatics\/btm098.","journal-title":"Bioinformatics"},{"key":"5690_CR22","first-page":"41","volume-title":"Proceedings of the AAAI-98 Workshop on Learning for Text Categorization","author":"A McCallum","year":"1998","unstructured":"McCallum A, Nigam K: A comparison of event models for Naive Bayes text classification. Proceedings of the AAAI-98 Workshop on Learning for Text Categorization. 1998, 41-48."},{"issue":"Database","key":"5690_CR23","doi-asserted-by":"publisher","first-page":"D190","DOI":"10.1093\/nar\/gkp951","volume":"38","author":"J Muller","year":"2010","unstructured":"Muller J, Szklarczyk D, Julien P, Letunic I, Roth A, Kuhn M, Powell S, von Mering C, Doerks T, Jensen LJ: eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 2010, 38 (Database): D190-195. 10.1093\/nar\/gkp951.","journal-title":"Nucleic Acids Res"},{"issue":"Database","key":"5690_CR24","doi-asserted-by":"publisher","first-page":"D396","DOI":"10.1093\/nar\/gkn803","volume":"37","author":"D Barrell","year":"2009","unstructured":"Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R: The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic Acids Res. 2009, 37 (Database): D396-403. 10.1093\/nar\/gkn803.","journal-title":"Nucleic Acids Res"},{"key":"5690_CR25","volume-title":"Doctoral Thesis","author":"AE Lobley","year":"2010","unstructured":"Lobley AE: Human protein function prediction: application of machine learning for integration of heterogeneous data sources. Doctoral Thesis. 2010, London: University College London"},{"issue":"Suppl 5","key":"5690_CR26","doi-asserted-by":"publisher","first-page":"S4","DOI":"10.1186\/1471-2105-9-S5-S4","volume":"9","author":"C Pesquita","year":"2008","unstructured":"Pesquita C, Faria D, Bastos H, Ferreira AE, Falcao AO, Couto FM: Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics. 2008, 9 (Suppl 5): S4-10.1186\/1471-2105-9-S5-S4.","journal-title":"BMC Bioinformatics"},{"issue":"9","key":"5690_CR27","doi-asserted-by":"publisher","first-page":"1173","DOI":"10.1093\/bioinformatics\/btp122","volume":"25","author":"MF Rogers","year":"2009","unstructured":"Rogers MF, Ben-Hur A: The use of gene ontology evidence codes in preventing classifier assessment bias. Bioinformatics. 2009, 25 (9): 1173-1177. 10.1093\/bioinformatics\/btp122.","journal-title":"Bioinformatics"},{"issue":"3","key":"5690_CR28","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"SF Altschul","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.","journal-title":"J Mol Biol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-14-S3-S1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:26:36Z","timestamp":1630535196000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-14-S3-S1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,2]]},"references-count":28,"journal-issue":{"issue":"S3","published-print":{"date-parts":[[2013,2]]}},"alternative-id":["5690"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-14-s3-s1","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,2]]},"assertion":[{"value":"28 February 2013","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S1"}}