{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T15:19:52Z","timestamp":1764688792753},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Functional genomics data provides a rich source of information that can be used in the annotation of the thousands of genes of unknown function found in most sequenced genomes. However, previous gene function prediction programs are mostly produced for relatively well-annotated organisms that often have a large amount of functional genomics data. Here, we present a novel method for predicting gene function that uses clustering of genes by semantic similarity, a na\u00efve Bayes classifier and \u2018enrichment analysis\u2019 to predict gene function for a genome that is less well annotated but does has a severe effect on human health, that of the malaria parasite Plasmodium falciparum.<\/jats:p>\n               <jats:p>Results: Predictions for the molecular function, biological process and cellular component of P.falciparum genes were created from eight different datasets with a combined prediction also being produced. The high-confidence predictions produced by the combined prediction were compared to those produced by a simple K-nearest neighbour classifier approach and were shown to improve accuracy and coverage. Finally, two case studies are described, which investigate two biological processes in more detail, that of translation initiation and invasion of the host cell.<\/jats:p>\n               <jats:p>Availability: Predictions produced are available at http:\/\/www.bioinformatics.leeds.ac.uk\/\u223cbio5pmrt\/PAGODA<\/jats:p>\n               <jats:p>Contact: \u00a0D.R.Westhead@leeds.ac.uk<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq450","type":"journal-article","created":{"date-parts":[[2010,8,7]],"date-time":"2010-08-07T02:17:53Z","timestamp":1281147473000},"page":"2431-2437","source":"Crossref","is-referenced-by-count":19,"title":["Gene function prediction using semantic similarity clustering and enrichment analysis in the malaria parasite <i>Plasmodium falciparum<\/i>"],"prefix":"10.1093","volume":"26","author":[{"given":"Philip M. R.","family":"Tedder","sequence":"first","affiliation":[{"name":"1 Institute of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT, 2Applied Computational Biology and Bioinformatics, Paterson Institute for Cancer Research, The University of Manchester, Manchester, M20 4BX, 3School of Computing and 4Institute of Integrative and Comparative Biology, University of Leeds, Leeds, LS2 9JT, UK"}]},{"given":"James R.","family":"Bradford","sequence":"additional","affiliation":[{"name":"1 Institute of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT, 2Applied Computational Biology and Bioinformatics, Paterson Institute for Cancer Research, The University of Manchester, Manchester, M20 4BX, 3School of Computing and 4Institute of Integrative and Comparative Biology, University of Leeds, Leeds, LS2 9JT, UK"}]},{"given":"Chris J.","family":"Needham","sequence":"additional","affiliation":[{"name":"1 Institute of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT, 2Applied Computational Biology and Bioinformatics, Paterson Institute for Cancer Research, The University of Manchester, Manchester, M20 4BX, 3School of Computing and 4Institute of Integrative and Comparative Biology, University of Leeds, Leeds, LS2 9JT, UK"}]},{"given":"Glenn A.","family":"McConkey","sequence":"additional","affiliation":[{"name":"1 Institute of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT, 2Applied Computational Biology and Bioinformatics, Paterson Institute for Cancer Research, The University of Manchester, Manchester, M20 4BX, 3School of Computing and 4Institute of Integrative and Comparative Biology, University of Leeds, Leeds, LS2 9JT, UK"}]},{"given":"Andrew J.","family":"Bulpitt","sequence":"additional","affiliation":[{"name":"1 Institute of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT, 2Applied Computational Biology and Bioinformatics, Paterson Institute for Cancer Research, The University of Manchester, Manchester, M20 4BX, 3School of Computing and 4Institute of Integrative and Comparative Biology, University of Leeds, Leeds, LS2 9JT, UK"}]},{"given":"David R.","family":"Westhead","sequence":"additional","affiliation":[{"name":"1 Institute of Molecular and Cellular Biology, University of Leeds, Leeds, LS2 9JT, 2Applied Computational Biology and Bioinformatics, Paterson Institute for Cancer Research, The University of Manchester, Manchester, M20 4BX, 3School of Computing and 4Institute of Integrative and Comparative Biology, University of Leeds, Leeds, LS2 9JT, UK"}]}],"member":"286","published-online":{"date-parts":[[2010,8,6]]},"reference":[{"key":"2023012508171320400_B1","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1042\/BST0360653","article-title":"Mechanism of ribosomal subunit joining during eukaryotic translation initiation","volume":"36","author":"Acker","year":"2008","journal-title":"Biochem. Soc. Trans."},{"key":"2023012508171320400_B2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. the Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023012508171320400_B3","doi-asserted-by":"crossref","first-page":"R35","DOI":"10.1186\/gb-2004-5-5-r35","article-title":"Prolinks: a database of protein functional linkages derived from coevolution","volume":"5","author":"Bowers","year":"2004","journal-title":"Genome Biol."},{"key":"2023012508171320400_B4","doi-asserted-by":"crossref","first-page":"E5","DOI":"10.1371\/journal.pbio.0000005","article-title":"The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum","volume":"1","author":"Bozdech","year":"2003","journal-title":"PLoS Biol."},{"key":"2023012508171320400_B5","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1186\/1471-2105-9-440","article-title":"PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data","volume":"9","author":"Brehelin","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012508171320400_B6","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1016\/0167-6377(90)90057-C","article-title":"An exact algorithm for the maximum clique problem","volume":"9","author":"Carraghan","year":"1990","journal-title":"Oper. Res. Lett."},{"key":"2023012508171320400_B7","doi-asserted-by":"crossref","first-page":"D363","DOI":"10.1093\/nar\/gkj123","article-title":"OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups","volume":"34","author":"Chen","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012508171320400_B8","doi-asserted-by":"crossref","first-page":"542","DOI":"10.1101\/gr.4573206","article-title":"Computational modeling of the Plasmodium falciparum interactome reveals protein function on a genome-wide scale","volume":"16","author":"Date","year":"2006","journal-title":"Genome Res."},{"key":"2023012508171320400_B9","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1038\/nature01107","article-title":"A proteomic view of the Plasmodium falciparum life cycle","volume":"419","author":"Florens","year":"2002","journal-title":"Nature"},{"key":"2023012508171320400_B10","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1016\/j.pt.2006.04.008","article-title":"Progress in in silico functional genomics: the malaria Metabolic Pathways database","volume":"22","author":"Ginsburg","year":"2006","journal-title":"Trends Parasitol."},{"key":"2023012508171320400_B11","doi-asserted-by":"crossref","first-page":"D452","DOI":"10.1093\/nar\/gkh052","article-title":"IntAct: an open source molecular interaction database","volume":"32","author":"Hermjakob","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012508171320400_B12","doi-asserted-by":"crossref","first-page":"D339","DOI":"10.1093\/nar\/gkh007","article-title":"GeneDB: a resource for prokaryotic and eukaryotic organisms","volume":"32","author":"Hertz-Fowler","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012508171320400_B13","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1016\/j.cell.2005.03.027","article-title":"Proteome analysis of separated male and female gametocytes reveals novel sex-specific Plasmodium biology","volume":"121","author":"Khan","year":"2005","journal-title":"Cell"},{"key":"2023012508171320400_B14","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1038\/nature04104","article-title":"A protein interaction network of the malaria parasite Plasmodium falciparum","volume":"438","author":"LaCount","year":"2005","journal-title":"Nature"},{"key":"2023012508171320400_B15","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1038\/nature01111","article-title":"Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry","volume":"419","author":"Lasonder","year":"2002","journal-title":"Nature"},{"key":"2023012508171320400_B16","doi-asserted-by":"crossref","first-page":"2308","DOI":"10.1101\/gr.2523904","article-title":"Global analysis of transcript and protein levels across the Plasmodium falciparum life cycle","volume":"14","author":"Le Roch","year":"2004","journal-title":"Genome Res."},{"key":"2023012508171320400_B17","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","article-title":"Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation","volume":"19","author":"Lord","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508171320400_B18","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1126\/science.285.5428.751","article-title":"Detecting protein function and protein-protein interactions from genome sequences","volume":"285","author":"Marcotte","year":"1999","journal-title":"Science"},{"key":"2023012508171320400_B19","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/0169-4758(95)80132-4","article-title":"The cytoplasmic ribosomal RNAs of Plasmodium spp","volume":"11","author":"McCutchan","year":"1995","journal-title":"Parasitol. Today"},{"key":"2023012508171320400_B20","doi-asserted-by":"crossref","first-page":"874","DOI":"10.1093\/bioinformatics\/btg097","article-title":"Improvement of the GenTHREADER method for genomic fold recognition","volume":"19","author":"McGuffin","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508171320400_B21","doi-asserted-by":"crossref","first-page":"D224","DOI":"10.1093\/nar\/gkl841","article-title":"New developments in the InterPro database","volume":"35","author":"Mulder","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012508171320400_B22","first-page":"331","article-title":"The Bayes Net Toolbox for MATLAB","volume":"33","author":"Murphy","year":"2001","journal-title":"Comput. Sci. Stat."},{"key":"2023012508171320400_B23","doi-asserted-by":"crossref","first-page":"2896","DOI":"10.1073\/pnas.96.6.2896","article-title":"The use of gene clusters to infer functional coupling","volume":"96","author":"Overbeek","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508171320400_B24","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1186\/1471-2105-10-142","article-title":"Incorporating functional inter-relationships into protein function prediction algorithms","volume":"10","author":"Pandey","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012508171320400_B25","doi-asserted-by":"crossref","first-page":"4285","DOI":"10.1073\/pnas.96.8.4285","article-title":"Assigning protein functions by comparative genome analysis: protein phylogenetic profiles","volume":"96","author":"Pellegrini","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"issue":"Suppl. 1","key":"2023012508171320400_B26","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/gb-2008-9-s1-s2","article-title":"A critical assessment of Mus musculus gene function prediction using integrated genomic evidence","volume":"9","author":"Pena-Castillo","year":"2008","journal-title":"Genome Biol."},{"key":"2023012508171320400_B27","doi-asserted-by":"crossref","first-page":"W116","DOI":"10.1093\/nar\/gki442","article-title":"InterProScan: protein domains identifier","volume":"33","author":"Quevillon","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012508171320400_B28","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1613\/jair.514","article-title":"Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language","volume":"11","author":"Resnik","year":"1999","journal-title":"J. Artif. Intell. Res."},{"key":"2023012508171320400_B29","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1016\/j.ceb.2009.01.023","article-title":"Recent mechanistic insights into eukaryotic ribosomes","volume":"21","author":"Rodnina","year":"2009","journal-title":"Curr. Opin. Cell Biol."},{"key":"2023012508171320400_B30","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1016\/j.cell.2009.01.042","article-title":"Regulation of translation initiation in eukaryotes: mechanisms and biological targets","volume":"136","author":"Sonenberg","year":"2009","journal-title":"Cell"},{"key":"2023012508171320400_B31","doi-asserted-by":"crossref","first-page":"543","DOI":"10.1016\/j.pt.2006.09.005","article-title":"PlasmoDB v5: new looks, new genomes","volume":"22","author":"Stoeckert","year":"2006","journal-title":"Trends Parasitol."},{"key":"2023012508171320400_B32","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508171320400_B33","doi-asserted-by":"crossref","first-page":"i529","DOI":"10.1093\/bioinformatics\/btm195","article-title":"Information theory applied to the sparse gene ontology annotation network to predict novel gene function","volume":"23","author":"Tao","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508171320400_B34","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/j.pt.2009.12.004","article-title":"PlasmoPredict: a gene function prediction website for Plasmodium falciparum","volume":"26","author":"Tedder","year":"2010","journal-title":"Trends Parasitol."},{"key":"2023012508171320400_B35","doi-asserted-by":"crossref","first-page":"8348","DOI":"10.1073\/pnas.0832373100","article-title":"A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)","volume":"100","author":"Troyanskaya","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508171320400_B36","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1093\/bioinformatics\/btm087","article-title":"A new method to measure the semantic similarity of GO terms","volume":"23","author":"Wang","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508171320400_B37","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1016\/j.ygeno.2009.08.003","article-title":"The transcriptional regulation of protein complexes; a cross-species perspective","volume":"94","author":"Webb","year":"2009","journal-title":"Genomics"},{"key":"2023012508171320400_B38","doi-asserted-by":"crossref","first-page":"1461","DOI":"10.1021\/pr0605769","article-title":"A draft of protein interactions in the malaria parasite P.falciparum","volume":"6","author":"Wuchty","year":"2007","journal-title":"J. Proteome Res."},{"key":"2023012508171320400_B39","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1016\/j.molbiopara.2005.05.007","article-title":"The Plasmodium falciparum sexual development transcriptome: a microarray analysis using ontology-based pattern identification","volume":"143","author":"Young","year":"2005","journal-title":"Mol. Biochem. Parasitol."},{"key":"2023012508171320400_B40","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.gene.2005.03.033","article-title":"Broadly predicting specific gene functions with expression similarity and taxonomy similarity","volume":"352","author":"Yu","year":"2005","journal-title":"Gene"},{"key":"2023012508171320400_B41","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1186\/1741-7007-3-14","article-title":"PCI proteins eIF3e and eIF3m define distinct translation initiation factor 3 complexes","volume":"3","author":"Zhou","year":"2005","journal-title":"BMC Biol."},{"key":"2023012508171320400_B42","doi-asserted-by":"crossref","first-page":"1237","DOI":"10.1093\/bioinformatics\/bti111","article-title":"In silico gene function prediction using ontology-based pattern identification","volume":"21","author":"Zhou","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012508171320400_B43","doi-asserted-by":"crossref","first-page":"e1570","DOI":"10.1371\/journal.pone.0001570","article-title":"Evidence-based annotation of the malaria parasite's genome using comparative expression profiling","volume":"3","author":"Zhou","year":"2008","journal-title":"Plos One"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/19\/2431\/48856596\/bioinformatics_26_19_2431.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/19\/2431\/48856596\/bioinformatics_26_19_2431.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:18:44Z","timestamp":1674634724000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/19\/2431\/229697"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,8,6]]},"references-count":43,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2010,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq450","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,10,1]]},"published":{"date-parts":[[2010,8,6]]}}}