{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T12:45:36Z","timestamp":1759495536598,"version":"3.37.3"},"reference-count":59,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2022,5,10]],"date-time":"2022-05-10T00:00:00Z","timestamp":1652140800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Institute for Data Valorization (IVADO)\/Genome Quebec","award":["PRF-2017-023"],"award-info":[{"award-number":["PRF-2017-023"]}]},{"name":"NSERC CREATE"},{"name":"Poland\u2019s National Scientific Center"},{"name":"NSERC Discovery"},{"name":"Fonds de la Recherche du Qu\u00e9bec en Sant\u00e9 (FRQS) Junior 1 Scholar"},{"DOI":"10.13039\/501100001804","name":"Canada Research Chairs program","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001804","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,5,26]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>There is a plethora of measures to evaluate functional similarity (FS) of genes based on their co-expression, protein\u2013protein interactions and sequence similarity. These measures are typically derived from hand-engineered and application-specific metrics to quantify the degree of shared information between two genes using their Gene Ontology (GO) annotations.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We introduce deepSimDEF, a deep learning method to automatically learn FS estimation of gene pairs given a set of genes and their GO annotations. deepSimDEF\u2019s key novelty is its ability to learn low-dimensional embedding vector representations of GO terms and gene products and then calculate FS using these learned vectors. We show that deepSimDEF can predict the FS of new genes using their annotations: it outperformed all other FS measures by &amp;gt;5\u201310% on yeast and human reference datasets on protein\u2013protein interactions, gene co-expression and sequence homology tasks. Thus, deepSimDEF offers a powerful and adaptable deep neural architecture that can benefit a wide range of problems in genomics and proteomics, and its architecture is flexible enough to support its extension to any organism.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Source code and data are available at https:\/\/github.com\/ahmadpgh\/deepSimDEF<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac304","type":"journal-article","created":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T19:28:25Z","timestamp":1651865305000},"page":"3051-3061","source":"Crossref","is-referenced-by-count":8,"title":["deepSimDEF: deep neural embeddings of gene products and gene ontology terms for functional analysis of genes"],"prefix":"10.1093","volume":"38","author":[{"given":"Ahmad","family":"Pesaranghader","sequence":"first","affiliation":[{"name":"Montreal Heart Institute , Research Center, Montreal H1T 1C8, Canada"},{"name":"Faculty of Medicine, University of Montreal , Montreal H3T 1J4, Canada"},{"name":"Mila\u2014Quebec Artificial Intelligence Institute , Montreal H2S 3H1, Canada"},{"name":"Department of Computer Science and Operations Research, University of Montreal , Montreal H3T 1J4, Canada"}]},{"given":"Stan","family":"Matwin","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science, Dalhousie University , Halifax B3H 4R2, Canada"},{"name":"Institute for Big Data Analytics, Dalhousie University , Halifax B3H 4R2, Canada"},{"name":"Institute of Computer Science, Polish Academy of Sciences , Warsaw, Poland"}]},{"given":"Marina","family":"Sokolova","sequence":"additional","affiliation":[{"name":"Institute for Big Data Analytics, Dalhousie University , Halifax B3H 4R2, Canada"},{"name":"Faculty of Medicine and Faculty of Engineering, University of Ottawa , Ottawa K1H 8M5, Canada"}]},{"given":"Jean-Christophe","family":"Grenier","sequence":"additional","affiliation":[{"name":"Montreal Heart Institute , Research Center, Montreal H1T 1C8, Canada"}]},{"given":"Robert G","family":"Beiko","sequence":"additional","affiliation":[{"name":"Faculty of Computer Science, Dalhousie University , Halifax B3H 4R2, Canada"},{"name":"Institute for Big Data Analytics, Dalhousie University , Halifax B3H 4R2, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4295-3339","authenticated-orcid":false,"given":"Julie","family":"Hussin","sequence":"additional","affiliation":[{"name":"Montreal Heart Institute , Research Center, Montreal H1T 1C8, Canada"},{"name":"Faculty of Medicine, University of Montreal , Montreal H3T 1J4, Canada"}]}],"member":"286","published-online":{"date-parts":[[2022,5,10]]},"reference":[{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"e0141287","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PLoS One"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","DOI":"10.3389\/fcvm.2021.711401","article-title":"Implementing machine learning in interventional cardiology: the benefits are worth the trouble","author":"Ben Ali","year":"2021","journal-title":"Front. Cardiovasc. Med"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/j.csbj.2017.01.009","article-title":"The effects of shared information on semantic calculations in the gene ontology","volume":"15","author":"Bible","year":"2017","journal-title":"Comput. Struct. Biotechnol. J"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"2185","DOI":"10.1093\/bioinformatics\/bty085","article-title":"The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier","volume":"34","author":"Cao","year":"2018","journal-title":"Bioinformatics"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"648","DOI":"10.1126\/science.1262110","article-title":"The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans","volume":"348","author":"Ardlie","year":"2015","journal-title":"Science"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/2041-1480-2-5","article-title":"Disjunctive shared information between ontology concepts: application to gene ontology","volume":"2","author":"Couto","year":"2011","journal-title":"J. Biomed. Semantics"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1007\/978-1-4939-3743-1_5","article-title":"Computational methods for annotation transfers from sequence","volume":"1446","author":"Cozzetto","year":"2017","journal-title":"Methods Mol. Biol. (Clifton, NJ)"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4939-3743-1","volume-title":"The Gene Ontology Handbook","author":"Dessimoz","year":"2017"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","DOI":"10.1089\/cmb.2018.0093","article-title":"Word and sentence embedding tools to measure semantic similarity of gene ontology terms by their definitions","author":"Duong","year":"2019","journal-title":"J. Comput. Biol"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1109\/TCBB.2017.2689762","article-title":"Assessment of semantic similarity between proteins using information content and topological properties of the gene ontology graph","volume":"15","author":"Dutta","year":"2018","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","article-title":"Cluster analysis and display of genome-wide expression patterns","volume":"95","author":"Eisen","year":"1998","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1093\/bioinformatics\/btl567","article-title":"Using GOstats to test gene lists for go term association","volume":"23","author":"Falcon","year":"2007","journal-title":"Bioinformatics"},{"year":"1957","author":"Firth","key":"2023041403082225100_"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"569","DOI":"10.1093\/bib\/bbr066","article-title":"Semantic similarity analysis of protein data: assessment with biological features and issues","volume":"13","author":"Guzzi","year":"2012","journal-title":"Brief. Bioinform"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"1527","DOI":"10.1162\/neco.2006.18.7.1527","article-title":"A fast learning algorithm for deep belief nets","volume":"18","author":"Hinton","year":"2006","journal-title":"Neural Comput"},{"year":"1997","author":"Jiang","key":"2023041403082225100_"},{"year":"2017","author":"Jiang","key":"2023041403082225100_"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1186\/s12859-019-2811-8","article-title":"Drug repositioning of herbal compounds via a machine-learning approach","volume":"20","author":"Kim","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1038\/234034a0","article-title":"Distance between sets","volume":"234","author":"Levandowsky","year":"1971","journal-title":"Nature"},{"key":"2023041403082225100_","first-page":"296","volume-title":"ICML","author":"Lin","year":"1998"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-020-80786-0","article-title":"Embeddings from deep learning transfer go annotations beyond homology","volume":"11","author":"Littmann","year":"2021","journal-title":"Sci. Rep"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","article-title":"Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation","volume":"19","author":"Lord","year":"2003","journal-title":"Bioinformatics"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"1182","DOI":"10.1093\/bioinformatics\/btz731","article-title":"Metric learning on expression data for gene function prediction","volume":"36","author":"Makrodimitris","year":"2020","journal-title":"Bioinformatics"},{"key":"2023041403082225100_","first-page":"886","article-title":"Gene ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery","volume":"18","author":"Mazandu","year":"2017","journal-title":"Brief. Bioinform"},{"key":"2023041403082225100_","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","author":"Mikolov","year":"2013","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.sbi.2017.02.005","article-title":"Network analysis and in silico prediction of protein\u2013protein interactions with applications in drug discovery","volume":"44","author":"Murakami","year":"2017","journal-title":"Curr. Opin. Struct. Biol"},{"first-page":"807","year":"2010","author":"Nair","key":"2023041403082225100_"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1186\/s12859-017-1605-0","article-title":"Cross disease analysis of co-functional microRNA pairs on a reconstructed network of disease-gene-microRNA tripartite","volume":"18","author":"Peng","year":"2017","journal-title":"BMC Bioinformatics"},{"year":"2019","author":"Pesaranghader","key":"2023041403082225100_"},{"key":"2023041403082225100_","first-page":"129","volume-title":"Joint International Semantic Technology Conference","author":"Pesaranghader","year":"2013"},{"first-page":"196","year":"2013","author":"Pesaranghader","key":"2023041403082225100_"},{"key":"2023041403082225100_","first-page":"203","volume-title":"Canadian Conference on Artificial Intelligence","author":"Pesaranghader","year":"2014"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"1380","DOI":"10.1093\/bioinformatics\/btv755","article-title":"simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes","volume":"32","author":"Pesaranghader","year":"2016","journal-title":"Bioinformatics"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"438","DOI":"10.1093\/jamia\/ocy189","article-title":"deepBioWSD: effective deep neural word sense disambiguation of biomedical text data","volume":"26","author":"Pesaranghader","year":"2019","journal-title":"J. Am. Med. Inform. Assoc"},{"first-page":"67","year":"2021","author":"Pesaranghader","key":"2023041403082225100_"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","DOI":"10.1101\/2021.08.13.456305","article-title":"ImputeCoVNet: 2D ResNet Autoencoder for imputation of SARS-CoV-2 sequences","volume-title":"bioRxiv","author":"Pesaranghader","year":"2021"},{"first-page":"38","year":"2007","author":"Pesquita","key":"2023041403082225100_"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2105-9-S5-S4","article-title":"Metrics for go based protein semantic similarity: a systematic evaluation","volume":"9","author":"Pesquita","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1016\/S0893-6080(98)00010-0","article-title":"Automatic early stopping using cross validation: quantifying the criteria","volume":"11","author":"Prechelt","year":"1998","journal-title":"Neural Netw"},{"key":"2023041403082225100_","article-title":"Using information content to evaluate semantic similarity in a taxonomy","volume-title":"arXiv preprint cmp-lg\/9511007","author":"Resnik","year":"1995"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"20707","DOI":"10.1038\/srep20707","article-title":"Cell type-specific properties and environment shape tissue specificity of cancer genes","volume":"6","author":"Schaefer","year":"2016","journal-title":"Sci. Rep"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"R33","DOI":"10.1186\/gb-2007-8-3-r33","article-title":"GOTax: investigating biological processes and biochemical activities along the taxonomic tree","volume":"8","author":"Schlicker","year":"2007","journal-title":"Genome Biol"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"e0244430","DOI":"10.1371\/journal.pone.0244430","article-title":"PFP-WGAN: protein function prediction by discovering gene ontology term correlations with generative adversarial networks","volume":"16","author":"Seyyedsalehi","year":"2021","journal-title":"PLoS One"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"468","DOI":"10.1109\/TCBB.2013.176","article-title":"Measure the semantic similarity of go terms using aggregate information content","volume":"11","author":"Song","year":"2014","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"3175","DOI":"10.1093\/bioinformatics\/btw342","article-title":"A probabilistic approach for collective similarity-based drug\u2013drug interaction prediction","volume":"32","author":"Sridhar","year":"2016","journal-title":"Bioinformatics"},{"key":"2023041403082225100_","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023041403082225100_","first-page":"2377","article-title":"Training very deep networks","author":"Srivastava","year":"2015"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"1424","DOI":"10.1093\/bioinformatics\/btt160","article-title":"Measuring gene functional similarity based on group-wise comparison of go terms","volume":"29","author":"Teng","year":"2013","journal-title":"Bioinformatics"},{"first-page":"1672","year":"2020","author":"Tian","key":"2023041403082225100_"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1074\/mcp.M116.060301","article-title":"Proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction","volume":"16","author":"Wang","year":"2017","journal-title":"Mol. Cell. Proteomics"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1093\/bioinformatics\/btm087","article-title":"A new method to measure the semantic similarity of go terms","volume":"23","author":"Wang","year":"2007","journal-title":"Bioinformatics"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"e66745","DOI":"10.1371\/journal.pone.0066745","article-title":"Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge-and IC-based hybrid method","volume":"8","author":"Wu","year":"2013","journal-title":"PLoS One"},{"first-page":"2048","year":"2015","author":"Xu","key":"2023041403082225100_"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"3547","DOI":"10.1093\/bioinformatics\/bty343","article-title":"MiRGOFS: a GO-based functional similarity measurement for miRNAs, with applications to the prediction of miRNA subcellular localization and miRNA\u2013disease association","volume":"34","author":"Yang","year":"2018","journal-title":"Bioinformatics"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1016\/j.jmgm.2017.07.012","article-title":"Prediction of protein structural class for low-similarity sequences using Chou\u2019s pseudo amino acid composition and wavelet denoising","volume":"76","author":"Yu","year":"2017","journal-title":"J. Mol. Graph. Model"},{"key":"2023041403082225100_","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1016\/j.jtbi.2016.04.020","article-title":"Protein\u2013protein interaction inference based on semantic similarity of gene ontology terms","volume":"401","author":"Zhang","year":"2016","journal-title":"J. Theor. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac304\/43692568\/btac304.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/11\/3051\/49878678\/btac304.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/11\/3051\/49878678\/btac304.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,21]],"date-time":"2023-11-21T04:52:50Z","timestamp":1700542370000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/11\/3051\/6583182"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,5,10]]},"references-count":59,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,5,26]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac304","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2022,6,1]]},"published":{"date-parts":[[2022,5,10]]}}}