{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T23:26:48Z","timestamp":1777418808258,"version":"3.51.4"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T00:00:00Z","timestamp":1768089600000},"content-version":"vor","delay-in-days":10,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Fonds de la Recherche Scientifique (F.R.S-F.N.R.S) with the Fund for Research Training in Industry and Agriculture","award":["40021542"],"award-info":[{"award-number":["40021542"]}]},{"name":"Fonds de la Recherche Scientifique (F.R.S-F.N.R.S) with the Fund for Research Training in Industry and Agriculture","award":["40037241"],"award-info":[{"award-number":["40037241"]}]},{"DOI":"10.13039\/501100002301","name":"Estonian Research Council","doi-asserted-by":"publisher","award":["PRG1021"],"award-info":[{"award-number":["PRG1021"]}],"id":[{"id":"10.13039\/501100002301","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002301","name":"Estonian Research Council","doi-asserted-by":"publisher","award":["40028692"],"award-info":[{"award-number":["40028692"]}],"id":[{"id":"10.13039\/501100002301","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,1,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Identifying the potential oligogenic causes of rare diseases remains a challenge, notwithstanding the advancements made in the last decade. While a variety of predictive and ranking approaches have been proposed, their precision remains limited, as only a small number of high-quality training cases are available and it remains difficult to know which features may be most relevant for the design of new predictors. We hypothesize here that structured biological information, which provides an integration of various relevant biological networks and ontologies in a single heterogeneous knowledge graph, can make a difference as it allows for learning a relevant genetic representation through KGE methods. An exhaustive benchmarking is performed here wherein we assess the performance of various state-of-the-art embedding models for the task of identifying potentially pathogenic gene pairs. The results obtained show that these KGE provide highly accurate predictions, leading to an Area Under the Precision-Recall Curve of up to $0.93$, representing also a significant advancement over previous approaches for predicting gene pairs involved in oligogenic diseases. We show nonetheless that care needs to be taken in the cross-validation when using embeddings, as data leakage between folds in embedding space will reveal overly optimistic results. The further evaluation of the methods on a holdout set as well as on a group of new male infertility cases show that three Translational Distance models (TransE, MurE, and RotatE) and two of the Semantic Matching models (DisMult and QuatE) provide the better results. The analysis is concluded by comparing all known gene combinations for these top-ranking models, examining their similarities and differences. Overall, KGE provide a predictive advancement but new steps will need to be taken generate explanations as to why the pairs are relevant for oligogenic diseases.<\/jats:p>","DOI":"10.1093\/bib\/bbaf712","type":"journal-article","created":{"date-parts":[[2026,1,3]],"date-time":"2026-01-03T12:25:44Z","timestamp":1767443144000},"source":"Crossref","is-referenced-by-count":1,"title":["Benchmarking knowledge graph embedding models for the prediction of oligogenic combinations"],"prefix":"10.1093","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-7776-5988","authenticated-orcid":false,"given":"Inas","family":"Bosch","sequence":"first","affiliation":[{"name":"Interuniversity Institute of Bioinformatics in Brussels, Universit\u00e9 Libre de Bruxelles-Vrije Universiteit Brussel , Boulevard du Triomphe CP263, 1050 Brussels ,","place":["Belgium"]},{"name":"Machine Learning Group, Universit\u00e9 Libre de Bruxelles , Boulevard du Triomphe CP212, 1050 Brussels ,","place":["Belgium"]},{"name":"Artificial Intelligence Laboratory, Vrije Universiteit Brussels , Pleinlaan 9, 1050 Brussels ,","place":["Belgium"]},{"name":"FARI Institute, Universit\u00e9 Libre de Bruxelles -Vrije Universiteit Brussels , Cantersteen 16, 1000 Brussels ,","place":["Belgium"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5422-2376","authenticated-orcid":false,"given":"Barbara","family":"Gravel","sequence":"additional","affiliation":[{"name":"Interuniversity Institute of Bioinformatics in Brussels, Universit\u00e9 Libre de Bruxelles-Vrije Universiteit Brussel , Boulevard du Triomphe CP263, 1050 Brussels ,","place":["Belgium"]},{"name":"Machine Learning Group, Universit\u00e9 Libre de Bruxelles , Boulevard du Triomphe CP212, 1050 Brussels ,","place":["Belgium"]},{"name":"Artificial Intelligence Laboratory, Vrije Universiteit Brussels , Pleinlaan 9, 1050 Brussels ,","place":["Belgium"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4339-2791","authenticated-orcid":false,"given":"Alexandre","family":"Renaux","sequence":"additional","affiliation":[{"name":"Interuniversity Institute of Bioinformatics in Brussels, Universit\u00e9 Libre de Bruxelles-Vrije Universiteit Brussel , Boulevard du Triomphe CP263, 1050 Brussels ,","place":["Belgium"]},{"name":"Machine Learning Group, Universit\u00e9 Libre de Bruxelles , Boulevard du Triomphe CP212, 1050 Brussels ,","place":["Belgium"]},{"name":"Artificial Intelligence Laboratory, Vrije Universiteit Brussels , Pleinlaan 9, 1050 Brussels ,","place":["Belgium"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6346-4564","authenticated-orcid":false,"given":"Ann","family":"Now\u00e9","sequence":"additional","affiliation":[{"name":"Interuniversity Institute of Bioinformatics in Brussels, Universit\u00e9 Libre de Bruxelles-Vrije Universiteit Brussel , Boulevard du Triomphe CP263, 1050 Brussels ,","place":["Belgium"]},{"name":"Artificial Intelligence Laboratory, Vrije Universiteit Brussels , Pleinlaan 9, 1050 Brussels ,","place":["Belgium"]},{"name":"FARI Institute, Universit\u00e9 Libre de Bruxelles -Vrije Universiteit Brussels , Cantersteen 16, 1000 Brussels ,","place":["Belgium"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8519-243X","authenticated-orcid":false,"given":"Maris","family":"Laan","sequence":"additional","affiliation":[{"name":"Chair of Human Genetics, Institute of Biomedicine and Translational Medicine, University of Tartu , Ravila 19, 50411 Tartu ,","place":["Estonia"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3645-1455","authenticated-orcid":false,"given":"Tom","family":"Lenaerts","sequence":"additional","affiliation":[{"name":"Interuniversity Institute of Bioinformatics in Brussels, Universit\u00e9 Libre de Bruxelles-Vrije Universiteit Brussel , Boulevard du Triomphe CP263, 1050 Brussels ,","place":["Belgium"]},{"name":"Machine Learning Group, Universit\u00e9 Libre de Bruxelles , Boulevard du Triomphe CP212, 1050 Brussels ,","place":["Belgium"]},{"name":"Artificial Intelligence Laboratory, Vrije Universiteit Brussels , Pleinlaan 9, 1050 Brussels ,","place":["Belgium"]},{"name":"FARI Institute, Universit\u00e9 Libre de Bruxelles -Vrije Universiteit Brussels , Cantersteen 16, 1000 Brussels ,","place":["Belgium"]}]}],"member":"286","published-online":{"date-parts":[[2026,1,11]]},"reference":[{"key":"2026011104410990900_ref1","doi-asserted-by":"publisher","DOI":"10.1201\/9780429329180","volume-title":"Next-Generation Sequencing Data Analysis","author":"Wang","year":"2023"},{"key":"2026011104410990900_ref2","doi-asserted-by":"crossref","first-page":"1271","DOI":"10.1016\/j.tig.2022.07.002","article-title":"Phenotype-aware prioritisation of rare Mendelian disease variants","volume":"38","author":"Kelly","year":"2022","journal-title":"Trends Genet"},{"key":"2026011104410990900_ref3","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbac019","article-title":"Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases","volume":"23","author":"Yuan","year":"2022","journal-title":"Brief Bioinform"},{"key":"2026011104410990900_ref4","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-37654-5","volume-title":"Vogel and Motulsky\u2019s Human Genetics","author":"Speicher","year":"2010"},{"key":"2026011104410990900_ref5","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btae184","article-title":"Prioritization of oligogenic variant combinations in whole exomes","volume":"40","author":"Gravel","year":"2024","journal-title":"Bioinformatics"},{"key":"2026011104410990900_ref6","doi-asserted-by":"publisher","first-page":"D900","DOI":"10.1093\/nar\/gkv1068","article-title":"DIDA: a curated and annotated digenic diseases database","volume":"44","author":"Gazzo","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2026011104410990900_ref7","doi-asserted-by":"publisher","first-page":"baac023","DOI":"10.1093\/database\/baac023","article-title":"Scaling up oligogenic diseases research with OLIDA: the Oligogenic diseases database","volume":"2022","author":"Nachtegael","year":"2022","journal-title":"Database (Oxford)"},{"key":"2026011104410990900_ref8","doi-asserted-by":"publisher","first-page":"11878","DOI":"10.1073\/pnas.1815601116","article-title":"Predicting disease-causing variant combinations","volume":"116","author":"Papadimitriou","year":"2019","journal-title":"Proc Natl Acad Sci USA"},{"key":"2026011104410990900_ref9","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-023-05291-3","article-title":"Faster and more accurate pathogenic combination predictions with VarCoPP2.0","volume":"24","author":"Versbraegen","year":"2023","journal-title":"BMC Bioinformatics"},{"key":"2026011104410990900_ref10","doi-asserted-by":"publisher","first-page":"1946","DOI":"10.1016\/j.ajhg.2021.08.010","article-title":"Identifying digenic disease genes via machine learning in the undiagnosed diseases network","volume":"108","author":"Mukherjee","year":"2021","journal-title":"Am J Hum Genet"},{"key":"2026011104410990900_ref11","doi-asserted-by":"publisher","first-page":"3639","DOI":"10.1016\/j.csbj.2022.07.011","article-title":"An accurate prediction model of digenic interaction for estimating pathogenic gene pairs of human diseases","volume":"20","author":"Yuan","year":"2022","journal-title":"Comput Struct Biotechnol J"},{"key":"2026011104410990900_ref12","doi-asserted-by":"publisher","first-page":"324","DOI":"10.1186\/s12859-023-05451-5","article-title":"A knowledge graph approach to predict and interpret disease-causing gene interactions","volume":"24","author":"Renaux","year":"2023","journal-title":"BMC Bioinformatics"},{"key":"2026011104410990900_ref13","doi-asserted-by":"publisher","first-page":"2724","DOI":"10.1109\/TKDE.2017.2754499","article-title":"Knowledge graph embedding: a survey of approaches and applications","volume":"29","author":"Wang","year":"2017","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2026011104410990900_ref14","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1007\/978-3-662-44848-9_11","article-title":"Open question answering with weakly supervised embedding models","volume-title":"Machine Learning and Knowledge Discovery in Databases","author":"Bordes","year":"2014"},{"key":"2026011104410990900_ref15","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1109\/MC.2009.263","article-title":"Matrix factorization techniques for recommender systems","volume":"42","author":"Koren","year":"2009","journal-title":"Computer"},{"key":"2026011104410990900_ref16","doi-asserted-by":"publisher","first-page":"1679","DOI":"10.1093\/bib\/bbaa012","article-title":"Biological applications of knowledge graph embedding models","volume":"22","author":"Mohamed","year":"2021","journal-title":"Brief Bioinform"},{"key":"2026011104410990900_ref17","doi-asserted-by":"publisher","first-page":"584","DOI":"10.1007\/978-3-030-86230-5_46","article-title":"Biomedical knowledge graph Embeddings for personalized medicine","volume-title":"Progress in Artificial Intelligence","author":"Vilela","year":"2021"},{"key":"2026011104410990900_ref18","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1186\/s13326-023-00291-x","article-title":"Multi-domain knowledge graph embeddings for gene-disease association prediction","volume":"14","author":"Nunes","year":"2023","journal-title":"J Biomed Semant"},{"key":"2026011104410990900_ref19","doi-asserted-by":"publisher","first-page":"3","DOI":"10.3233\/SW-190368","article-title":"Neural-symbolic integration and the semantic web","volume":"11","author":"Hitzler","year":"2020","journal-title":"Semantic Web"},{"key":"2026011104410990900_ref20","doi-asserted-by":"publisher","DOI":"10.3389\/fendo.2024.1312357","article-title":"Undiagnosed RASopathies in infertile men","volume":"15","author":"Juchnewitsch","year":"2024","journal-title":"Front Endocrinol"},{"key":"2026011104410990900_ref21","doi-asserted-by":"publisher","first-page":"877","DOI":"10.1016\/j.ajhg.2024.03.013","article-title":"Toward clinical exomes in diagnostics and management of male infertility","volume":"111","author":"Lillepea","year":"2024","journal-title":"Am J Hum Genet"},{"key":"2026011104410990900_ref22","doi-asserted-by":"publisher","first-page":"012016","DOI":"10.1088\/1742-6596\/1487\/1\/012016","article-title":"A survey on application of knowledge graph","volume":"1487","author":"Zou","year":"2020","journal-title":"J Phys Conf Ser"},{"key":"2026011104410990900_ref23","volume-title":"BOCK: Biological Networks and Oligogenic Combinations as a Knowledge Graph","author":"Renaux","year":"2024"},{"key":"2026011104410990900_ref24","first-page":"809","article-title":"A three-way model for collective learning on multi-relational data","volume-title":"Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML\u201911","author":"Nickel","year":"2011"},{"key":"2026011104410990900_ref25","doi-asserted-by":"publisher","first-page":"8825","DOI":"10.1109\/TPAMI.2021.3124805","article-title":"Bringing light into the dark: a large-scale evaluation of knowledge graph embedding models under a unified framework","volume":"44","author":"Ali","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2026011104410990900_ref26","article-title":"Translating embeddings for modeling multi-relational data","volume-title":"Advances in Neural Information Processing Systems","author":"Bordes","year":"2013"},{"key":"2026011104410990900_ref27","article-title":"Multi-relational Poincar\u00e9 graph embeddings","volume-title":"Advances in Neural Information Processing Systems","author":"Balazevic","year":"2019"},{"key":"2026011104410990900_ref28","article-title":"RotatE: knowledge graph embedding by relational rotation in complex space","author":"Sun","year":"2019"},{"key":"2026011104410990900_ref29","article-title":"Embedding entities and relations for learning and inference in knowledge bases","author":"Yang","year":"2015"},{"key":"2026011104410990900_ref30","first-page":"2071","article-title":"Complex Embeddings for simple link prediction","volume-title":"Proceedings of The 33rd International Conference on Machine Learning","author":"Trouillon"},{"key":"2026011104410990900_ref31","article-title":"Quaternion knowledge graph embeddings","volume-title":"Advances in Neural Information Processing Systems","author":"Zhang","year":"2019"},{"key":"2026011104410990900_ref32","first-page":"1811","article-title":"Convolutional 2D knowledge graph embeddings","volume-title":"Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence","author":"Dettmers","year":"2018"},{"key":"2026011104410990900_ref33","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1145\/2623330.2623623","article-title":"Knowledge vault: a web-scale approach to probabilistic knowledge fusion","volume-title":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","author":"Dong","year":"2014"},{"key":"2026011104410990900_ref34","doi-asserted-by":"publisher","first-page":"306","DOI":"10.1186\/s12859-019-2914-2","article-title":"edge2vec: Representation learning using edge semantics for biomedical knowledge discovery","volume":"20","author":"Gao","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2026011104410990900_ref35","doi-asserted-by":"publisher","first-page":"e1003709","DOI":"10.1371\/journal.pgen.1003709","article-title":"Genic intolerance to functional variation and the interpretation of personal genomes","volume":"9","author":"Petrovski","year":"2013","journal-title":"PLoS Genet"},{"key":"2026011104410990900_ref36","volume-title":"Using Random Forest to Learn Imbalanced Data","author":"Chen","year":"2004"},{"key":"2026011104410990900_ref37","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1016\/0925-2312(91)90023-5","article-title":"Multilayer perceptrons for classification and regression","volume":"2","author":"Murtagh","year":"1991","journal-title":"Neurocomputing"},{"key":"2026011104410990900_ref38","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach Learn"},{"key":"2026011104410990900_ref39","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"Auton","year":"2015","journal-title":"Nature"},{"key":"2026011104410990900_ref40","doi-asserted-by":"publisher","first-page":"310","DOI":"10.1038\/ng.2892","article-title":"A general framework for estimating the relative pathogenicity of human genetic variants","volume":"46","author":"Kircher","year":"2014","journal-title":"Nat Genet"},{"key":"2026011104410990900_ref41","doi-asserted-by":"publisher","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLoS One"},{"key":"2026011104410990900_ref42","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/03610917408548446","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Cali\u0144ski","year":"1974","journal-title":"Commun Stat"},{"key":"2026011104410990900_ref43","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1016\/j.beem.2010.08.006","article-title":"Male infertility: pathogenesis and clinical diagnosis","volume":"25","author":"Krausz","year":"2011","journal-title":"Best Pract Res Clin Endocrinol Metab"},{"key":"2026011104410990900_ref44","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1093\/humrep\/dew284","article-title":"Causes of male infertility: a 9-year prospective monocentre study on 1737 patients with reduced total sperm counts","volume":"32","author":"Punab","year":"2016","journal-title":"Hum Reprod"},{"key":"2026011104410990900_ref45","doi-asserted-by":"publisher","DOI":"10.1126\/science.aao1729","article-title":"Systematic analysis of complex genetic interactions","volume":"360","author":"Kuzmin","year":"2018","journal-title":"Science (New York, NY)"},{"key":"2026011104410990900_ref46","article-title":"Knowledge graph embeddings and explainable AI","author":"Bianchi"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/27\/1\/bbaf712\/66342056\/bbaf712.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/27\/1\/bbaf712\/66342056\/bbaf712.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T09:41:18Z","timestamp":1768124478000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf712\/8419939"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,7]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf712","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,1]]},"published":{"date-parts":[[2026,1]]},"article-number":"bbaf712"}}