{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T10:19:14Z","timestamp":1773742754912,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2022,11,16]],"date-time":"2022-11-16T00:00:00Z","timestamp":1668556800000},"content-version":"vor","delay-in-days":15,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,11,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Recent advances in Knowledge Graphs (KGs) and Knowledge Graph Embedding Models (KGEMs) have led to their adoption in a broad range of fields and applications. The current publishing system in machine learning requires newly introduced KGEMs to achieve state-of-the-art performance, surpassing at least one benchmark in order to be published. Despite this, dozens of novel architectures are published every year, making it challenging for users, even within the field, to deduce the most suitable configuration for a given application. A typical biomedical application of KGEMs is drug\u2013disease prediction in the context of drug discovery, in which a KGEM is trained to predict triples linking drugs and diseases. These predictions can be later tested in clinical trials following extensive experimental validation. However, given the infeasibility of evaluating each of these predictions and that only a minimal number of candidates can be experimentally tested, models that yield higher precision on the top prioritized triples are preferred. In this paper, we apply the concept of ensemble learning on KGEMs for drug discovery to assess whether combining the predictions of several models can lead to an overall improvement in predictive performance. First, we trained and benchmarked 10 KGEMs to predict drug\u2013disease triples on two independent biomedical KGs designed for drug discovery. Following, we applied different ensemble methods that aggregate the predictions of these models by leveraging the distribution or the position of the predicted triple scores. We then demonstrate how the ensemble models can achieve better results than the original KGEMs by benchmarking the precision (i.e., number of true positives prioritized) of their top predictions. Lastly, we released the source code presented in this work at https:\/\/github.com\/enveda\/kgem-ensembles-in-drug-discovery.<\/jats:p>","DOI":"10.1093\/bib\/bbac481","type":"journal-article","created":{"date-parts":[[2022,11,17]],"date-time":"2022-11-17T00:41:36Z","timestamp":1668645696000},"source":"Crossref","is-referenced-by-count":17,"title":["Ensembles of knowledge graph embedding models improve predictions for drug discovery"],"prefix":"10.1093","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0163-7890","authenticated-orcid":false,"given":"Daniel","family":"Rivas-Barragan","sequence":"first","affiliation":[{"name":"Enveda Biosciences , Boulder, CO , USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2046-6145","authenticated-orcid":false,"given":"Daniel","family":"Domingo-Fern\u00e1ndez","sequence":"additional","affiliation":[{"name":"Enveda Biosciences , Boulder, CO , USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7683-0452","authenticated-orcid":false,"given":"Yojana","family":"Gadiya","sequence":"additional","affiliation":[{"name":"Enveda Biosciences , Boulder, CO , USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9584-9757","authenticated-orcid":false,"given":"David","family":"Healey","sequence":"additional","affiliation":[{"name":"Enveda Biosciences , Boulder, CO , USA"}]}],"member":"286","published-online":{"date-parts":[[2022,11,16]]},"reference":[{"issue":"13","key":"2022112111204349600_ref1","doi-asserted-by":"crossref","first-page":"i457","DOI":"10.1093\/bioinformatics\/bty294","article-title":"Modeling polypharmacy side effects with graph convolutional networks","volume":"34","author":"Zitnik","year":"2018","journal-title":"Bioinformatics"},{"key":"2022112111204349600_ref2","doi-asserted-by":"crossref","first-page":"381","DOI":"10.3389\/fgene.2019.00381","article-title":"To embed or not: network embedding as a paradigm in computational biology","volume":"10","author":"Nelson","year":"2019","journal-title":"Front Genet"},{"key":"2022112111204349600_ref3","doi-asserted-by":"crossref","first-page":"8404","DOI":"10.1109\/ACCESS.2018.2886311","article-title":"GrEDeL: a knowledge graph embedding based method for drug discovery from biomedical literature","volume":"7","author":"Sang","year":"2019","journal-title":"IEEE Access"},{"key":"2022112111204349600_ref4","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1007\/978-3-030-77385-4_22","volume-title":"European Semantic Web Conference","author":"Liu","year":"2021"},{"issue":"1","key":"2022112111204349600_ref5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-021-04082-y","article-title":"Application of network link prediction in drug discovery","volume":"22","author":"Abbas","year":"2021","journal-title":"BMC Bioinformatics"},{"issue":"12","key":"2022112111204349600_ref6","doi-asserted-by":"crossref","first-page":"e1008464","DOI":"10.1371\/journal.pcbi.1008464","article-title":"Drug2ways: reasoning over causal paths in biological networks for drug discovery","volume":"16","author":"Rivas-Barragan","year":"2020","journal-title":"PLoS Comput Biol"},{"issue":"1","key":"2022112111204349600_ref7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-018-2163-9","article-title":"Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches","volume":"19","author":"Crichton","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2022112111204349600_ref8","article-title":"Representation learning on graphs: methods and applications","author":"Hamilton","year":"2017","journal-title":"IEEE Data Eng Bull"},{"key":"2022112111204349600_ref9","doi-asserted-by":"crossref","first-page":"bbac404","DOI":"10.1093\/bib\/bbac404","article-title":"A review of biomedical datasets relating to drug discovery: a knowledge graph perspective","author":"Bonner","year":"2022","journal-title":"Brief Bioinform"},{"issue":"1","key":"2022112111204349600_ref10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-022-04608-y","article-title":"Task-driven knowledge graph filtering improves prioritizing drugs for repurposing","volume":"23","author":"Ratajczak","year":"2022","journal-title":"BMC Bioinformatics"},{"key":"2022112111204349600_ref11","first-page":"1","article-title":"Bringing light into the dark: a large-scale evaluation of knowledge graph embedding models under a unified framework","author":"Ali","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"5","key":"2022112111204349600_ref12","doi-asserted-by":"crossref","first-page":"bbac279","DOI":"10.1093\/bib\/bbac279","article-title":"Implications of topological imbalance for representation learning on biomedical knowledge graphs","volume":"23","author":"Bonner","year":"2022","journal-title":"Brief Bioinform"},{"key":"2022112111204349600_ref13","first-page":"167","volume-title":"Proceedings of the Conference","author":"Chang","year":"2020"},{"key":"2022112111204349600_ref14","first-page":"100036","article-title":"Understanding the performance of knowledge graph embeddings in drug discovery","volume":"2","author":"Bonner","year":"2022","journal-title":"Artif Intell Life Sci"},{"key":"2022112111204349600_ref15","volume-title":"PKDD ECML 2nd Workshop on Linked Data for Knowledge Discovery","author":"Krompa\u00df","year":"2015"},{"key":"2022112111204349600_ref16","volume-title":"ICML 2011","author":"Nickel","year":"2011"},{"key":"2022112111204349600_ref17","first-page":"2787","volume-title":"Neural Information Processing Systems","author":"Bordes","year":"2013"},{"key":"2022112111204349600_ref18","first-page":"601","volume-title":"Knowledge vault: a web-scale approach to probabilistic knowledge fusion","author":"Dong","year":"2014"},{"issue":"3","key":"2022112111204349600_ref19","first-page":"61","article-title":"Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods","volume":"10","author":"Platt","year":"1999","journal-title":"Adv Large Margin Class"},{"issue":"8","key":"2022112111204349600_ref20","doi-asserted-by":"crossref","first-page":"2651","DOI":"10.3390\/app10082651","article-title":"An approach to knowledge base completion by a committee-based knowledge graph embedding","volume":"10","author":"Choi","year":"2020","journal-title":"Appl Sci"},{"issue":"8","key":"2022112111204349600_ref21","doi-asserted-by":"crossref","first-page":"1771","DOI":"10.1162\/089976602760128018","article-title":"Training products of experts by minimizing contrastive divergence","volume":"14","author":"Hinton","year":"2002","journal-title":"Neural Comput"},{"key":"2022112111204349600_ref22","first-page":"1","volume-title":"International Joint Conference on Neural Networks (IJCNN)","author":"Xu","year":"2021"},{"key":"2022112111204349600_ref23","article-title":"Embedding entities and relations for learning and inference in knowledge bases","author":"Yang","year":"2014"},{"key":"2022112111204349600_ref24","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Wang","year":"2014"},{"key":"2022112111204349600_ref25","first-page":"2071","article-title":"Complex embeddings for simple link prediction","author":"Trouillon","journal-title":"Int Conf Mach Learn"},{"key":"2022112111204349600_ref26","volume-title":"30th AAAI Conference on Artificial Intelligence","author":"Nickel","year":"2016"},{"key":"2022112111204349600_ref27","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Dettmers","year":"2018"},{"key":"2022112111204349600_ref28","article-title":"RotatE: knowledge graph embedding by relational rotation in complex space","author":"Sun","year":"2019"},{"key":"2022112111204349600_ref29","article-title":"Multi-relational poincar\u00e9 graph embeddings","volume":"32","author":"Balazevic","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"issue":"1","key":"2022112111204349600_ref30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-020-74922-z","article-title":"Preclinical validation of therapeutic targets predicted by tensor factorization on heterogeneous graphs","volume":"10","author":"Paliwal","year":"2020","journal-title":"Sci Rep"},{"issue":"4","key":"2022112111204349600_ref31","doi-asserted-by":"crossref","first-page":"bbaa344","DOI":"10.1093\/bib\/bbaa344","article-title":"PharmKG: a dedicated knowledge graph benchmark for biomedical data mining","volume":"22","author":"Zheng","year":"2021","journal-title":"Brief Bioinform"},{"key":"2022112111204349600_ref32","doi-asserted-by":"crossref","first-page":"3173","DOI":"10.1145\/3340531.3412776","volume-title":"Proceedings of the 29th ACM International Conference on Information & Knowledge Management","author":"Walsh","year":"2020"},{"issue":"13","key":"2022112111204349600_ref33","doi-asserted-by":"crossref","first-page":"4097","DOI":"10.1093\/bioinformatics\/btaa274","article-title":"OpenBioLink: a benchmarking framework for large-scale biomedical link prediction","volume":"36","author":"Breit","year":"2020","journal-title":"Bioinformatics"},{"issue":"82","key":"2022112111204349600_ref34","first-page":"1","article-title":"PyKEEN 1.0: a python library for training and evaluating knowledge graph embeddings","volume":"22","author":"Ali","year":"2021","journal-title":"J Mach Learn Res"},{"key":"2022112111204349600_ref35","doi-asserted-by":"crossref","first-page":"e26726","DOI":"10.7554\/eLife.26726","article-title":"Systematic integration of biomedical knowledge prioritizes drugs for repurposing","volume":"6","author":"Himmelstein","year":"2017","journal-title":"Elife"},{"key":"2022112111204349600_ref36","doi-asserted-by":"crossref","first-page":"692","DOI":"10.1038\/s41587-021-01145-6","article-title":"A knowledge graph to interpret clinical proteomics data","volume":"40","author":"Santos","year":"2022","journal-title":"Nat Biotechnol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/6\/bbac481\/47144689\/bbac481.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/6\/bbac481\/47144689\/bbac481.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,21]],"date-time":"2022-11-21T11:30:05Z","timestamp":1669030205000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac481\/6831005"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11]]},"references-count":36,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,11,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac481","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,11]]},"published":{"date-parts":[[2022,11]]},"article-number":"bbac481"}}