{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T00:24:16Z","timestamp":1767918256442,"version":"3.49.0"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Measures of protein functional similarity are essential tools for function prediction, evaluation of protein\u2013protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions.<\/jats:p>\n               <jats:p>Results: We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement &amp;gt;4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by\u2009&amp;gt;\u20092.5% in F1 score for molecular function hierarchy.<\/jats:p>\n               <jats:p>Availability and implementation: Datasets, results and source code are available at http:\/\/kiwi.cs.dal.ca\/Software\/simDEF<\/jats:p>\n               <jats:p>Contact: \u00a0ahmad.pgh@dal.ca or beiko@cs.dal.ca<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv755","type":"journal-article","created":{"date-parts":[[2015,12,27]],"date-time":"2015-12-27T01:18:54Z","timestamp":1451179134000},"page":"1380-1387","source":"Crossref","is-referenced-by-count":24,"title":["simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes"],"prefix":"10.1093","volume":"32","author":[{"given":"Ahmad","family":"Pesaranghader","sequence":"first","affiliation":[{"name":"1 Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada,"},{"name":"2 Institute for Big Data Analytics, Halifax, NS B3H 4R2, Canada,"}]},{"given":"Stan","family":"Matwin","sequence":"additional","affiliation":[{"name":"1 Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada,"},{"name":"2 Institute for Big Data Analytics, Halifax, NS B3H 4R2, Canada,"},{"name":"3 Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland and"}]},{"given":"Marina","family":"Sokolova","sequence":"additional","affiliation":[{"name":"2 Institute for Big Data Analytics, Halifax, NS B3H 4R2, Canada,"},{"name":"4 Faculty of Medicine and Faculty of Engineering, University of Ottawa, Ottawa, ON K1H 8M5, Canada"}]},{"given":"Robert G.","family":"Beiko","sequence":"additional","affiliation":[{"name":"1 Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada,"}]}],"member":"286","published-online":{"date-parts":[[2015,12,26]]},"reference":[{"key":"2023020112220897500_btv755-B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023020112220897500_btv755-B2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene Ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2023020112220897500_btv755-B3","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/nar\/gkg095","article-title":"The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003","volume":"31","author":"Boeckmann","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023020112220897500_btv755-B4","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1093\/nar\/26.1.73","article-title":"SGD: Saccharomyces genome database","volume":"26","author":"Cherry","year":"1998","journal-title":"Nucleic Acids Res"},{"key":"2023020112220897500_btv755-B5","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/2041-1480-2-5","article-title":"Disjunctive shared information between ontology concepts: application to Gene Ontology","volume":"2","author":"Couto","year":"2011","journal-title":"J. Biomed. Semant"},{"key":"2023020112220897500_btv755-B6","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1093\/bioinformatics\/btl567","article-title":"Using GOstats to test gene lists for GO term association","volume":"23","author":"Falcon","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020112220897500_btv755-B7","volume-title":"A Synopsis of Linguistic Theory 1930\u20131955, Volume 1952\u201359","author":"Firth","year":"1957"},{"key":"2023020112220897500_btv755-B8","doi-asserted-by":"crossref","first-page":"R183","DOI":"10.1186\/gb-2007-8-9-r183","article-title":"David gene functional classification tool: a novel biological module centric algorithm to functionally analyze large gene list","volume":"8","author":"Huang","year":"2007","journal-title":"Genome Biol"},{"key":"2023020112220897500_btv755-B9","doi-asserted-by":"crossref","first-page":"562","DOI":"10.1186\/1471-2105-11-562","article-title":"An improved method for scoring protein-protein interactions using semantic similarity within the gene ontology","volume":"11","author":"Jain","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023020112220897500_btv755-B10","article-title":"Semantic similarity based on corpus statistics and lexical taxonomy","author":"Jiang","year":"1997","journal-title":"ArXiv Prepr"},{"key":"2023020112220897500_btv755-B11","doi-asserted-by":"crossref","first-page":"2445","DOI":"10.1093\/bioinformatics\/btq449","article-title":"Identifying informative subsets of the Gene Ontology with information bottleneck methods","volume":"26","author":"Jin","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020112220897500_btv755-B12","first-page":"834","volume-title":"AMIA Annual Symposium Proceedings, Vol. 2011","author":"Jin","year":"2011"},{"key":"2023020112220897500_btv755-B13","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1186\/1471-2164-8-222","article-title":"Quantitative assessment of relationship between sequence similarity and function similarity","volume":"8","author":"Joshi","year":"2007","journal-title":"BMC Genomics"},{"key":"2023020112220897500_btv755-B14","first-page":"pp. 296","volume-title":"Icml","author":"Lin","year":"1998"},{"key":"2023020112220897500_btv755-B15","doi-asserted-by":"crossref","first-page":"12480","DOI":"10.1016\/j.eswa.2009.04.034","article-title":"An weighted ontology-based semantic similarity algorithm for web service","volume":"36","author":"Liu","year":"2009","journal-title":"Expert Syst. Appl"},{"key":"2023020112220897500_btv755-B16","first-page":"363","author":"Liu","year":"2012"},{"key":"2023020112220897500_btv755-B17","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","article-title":"Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation","volume":"19","author":"Lord","year":"2003","journal-title":"Bioinformatics"},{"key":"2023020112220897500_btv755-B18","first-page":"1","article-title":"Using Semantic Similarities and csbl. go for Analyzing Microarray Data","volume":"10","author":"Ovaska","year":"2015","journal-title":"Methods Mol. Biol"},{"key":"2023020112220897500_btv755-B19","doi-asserted-by":"crossref","first-page":"38","DOI":"10.3115\/1614025.1614037","volume-title":"Demonstration Papers at Hlt-Naacl 2004","author":"Pedersen","year":"2004"},{"key":"2023020112220897500_btv755-B20","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1007\/978-3-642-40567-9_23","volume-title":"Soft Computing Applications and Intelligent Systems","author":"Pesaranghader","year":"2013"},{"key":"2023020112220897500_btv755-B21","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1109\/ICICM.2013.41","volume-title":"IEEE International Conference on Informatics and Creative Multimedia (ICICM) 2013","author":"Pesaranghader","year":"2013"},{"key":"2023020112220897500_btv755-B22","first-page":"280","article-title":"Word sense disambiguation for biomedical text mining using definition-based semantic relatedness and similarity measures","volume":"4","author":"Pesaranghader","year":"2014","journal-title":"Int. J. Biosci. Biochem. Bioinforma"},{"key":"2023020112220897500_btv755-B23","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1007\/978-3-319-06483-3_18","volume-title":"Advances in Artificial Intelligence","author":"Pesaranghader","year":"2014"},{"key":"2023020112220897500_btv755-B24","first-page":"38","author":"Pesquita","year":"2007"},{"key":"2023020112220897500_btv755-B25","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2105-9-S5-S4","article-title":"Metrics for GO based protein semantic similarity: a systematic evaluation","volume":"9","author":"Pesquita","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020112220897500_btv755-B26","doi-asserted-by":"crossref","first-page":"W134","DOI":"10.1093\/nar\/gkv523","article-title":"INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity","volume":"43","author":"Piovesan","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020112220897500_btv755-B27","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/j.jprot.2015.03.009","article-title":"Extracting high confidence protein interactions from affinity purification data: at the crossroads","volume":"118","author":"Pu","year":"2015","journal-title":"J. Proteomics"},{"key":"2023020112220897500_btv755-B28","author":"Resnik","year":"1995"},{"key":"2023020112220897500_btv755-B29","doi-asserted-by":"crossref","first-page":"D449","DOI":"10.1093\/nar\/gkh086","article-title":"The database of interacting proteins: 2004 update","volume":"32","author":"Salwinski","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023020112220897500_btv755-B30","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1186\/1471-2105-7-302","article-title":"A new measure for functional similarity of gene products based on Gene Ontology","volume":"7","author":"Schlicker","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023020112220897500_btv755-B31","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1109\/TCBB.2005.50","article-title":"Correlation between gene expression and GO semantic similarity","volume":"2","author":"Sevilla","year":"2005","journal-title":"IEEEACM Trans. Comput. Biol. Bioinforma"},{"key":"2023020112220897500_btv755-B32","doi-asserted-by":"crossref","first-page":"468","DOI":"10.1109\/TCBB.2013.176","article-title":"Measure the semantic similarity of go terms using aggregate information content","volume":"TCBB 11","author":"Song","year":"2014","journal-title":"IEEE ACM Trans. Comput. Biol. Bioinforma"},{"key":"2023020112220897500_btv755-B33","doi-asserted-by":"crossref","first-page":"1424","DOI":"10.1093\/bioinformatics\/btt160","article-title":"Measuring gene functional similarity based on group-wise comparison of GO terms","volume":"29","author":"Teng","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020112220897500_btv755-B34","doi-asserted-by":"crossref","first-page":"D190","DOI":"10.1093\/nar\/gkm895","article-title":"The universal protein resource (UniProt)","volume":"36","author":"The UniProt Consortium","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023020112220897500_btv755-B35","first-page":"25","author":"Wang","year":"2004"},{"key":"2023020112220897500_btv755-B36","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1093\/bioinformatics\/btm087","article-title":"A new method to measure the semantic similarity of GO terms","volume":"23","author":"Wang","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020112220897500_btv755-B37","doi-asserted-by":"crossref","first-page":"W214","DOI":"10.1093\/nar\/gkq537","article-title":"The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function","volume":"38","author":"Warde-Farley","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023020112220897500_btv755-B38","doi-asserted-by":"crossref","first-page":"2137","DOI":"10.1093\/nar\/gkl219","article-title":"Prediction of yeast protein\u2013protein interaction network: insights from the Gene Ontology and annotations","volume":"34","author":"Wu","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023020112220897500_btv755-B39","author":"Wu","year":"2013"},{"key":"2023020112220897500_btv755-B40","first-page":"pp. 133","author":"Wu","year":"1994"},{"key":"2023020112220897500_btv755-B41","doi-asserted-by":"crossref","first-page":"472","DOI":"10.1186\/1471-2105-9-472","article-title":"Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data","volume":"9","author":"Xu","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020112220897500_btv755-B42","doi-asserted-by":"crossref","first-page":"1383","DOI":"10.1093\/bioinformatics\/bts129","article-title":"Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty","volume":"28","author":"Yang","year":"2012","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/9\/1380\/49019250\/bioinformatics_32_9_1380.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/9\/1380\/49019250\/bioinformatics_32_9_1380.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:28:13Z","timestamp":1675290493000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/9\/1380\/1743954"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,12,26]]},"references-count":42,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2016,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv755","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,5,1]]},"published":{"date-parts":[[2015,12,26]]}}}