{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:05Z","timestamp":1772138045884,"version":"3.50.1"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2021,5,8]],"date-time":"2021-05-08T00:00:00Z","timestamp":1620432000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"German Federal Ministry of Education and Research","award":["01IS18050D"],"award-info":[{"award-number":["01IS18050D"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,10,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Summary<\/jats:title>\n                    <jats:p>As machine learning and artificial intelligence increasingly attain a larger number of applications in the biomedical domain, at their core, their utility depends on the data used to train them. Due to the complexity and high dimensionality of biomedical data, there is a need for approaches that combine prior knowledge around known biological interactions with patient data. Here, we present CLinical Embedding of Patients (CLEP), a novel approach that generates new patient representations by leveraging both prior knowledge and patient-level data. First, given a patient-level dataset and a knowledge graph containing relations across features that can be mapped to the dataset, CLEP incorporates patients into the knowledge graph as new nodes connected to their most characteristic features. Next, CLEP employs knowledge graph embedding models to generate new patient representations that can ultimately be used for a variety of downstream tasks, ranging from clustering to classification. We demonstrate how using new patient representations generated by CLEP significantly improves performance in classifying between patients and healthy controls for a variety of machine learning models, as compared to the use of the original transcriptomics data. Furthermore, we also show how incorporating patients into a knowledge graph can foster the interpretation and identification of biological features characteristic of a specific disease or patient subgroup. Finally, we released CLEP as an open source Python package together with examples and documentation.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>CLEP is available to the bioinformatics community as an open source Python package at https:\/\/github.com\/hybrid-kg\/clep under the Apache 2.0 License.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab340","type":"journal-article","created":{"date-parts":[[2021,5,3]],"date-time":"2021-05-03T07:25:31Z","timestamp":1620026731000},"page":"3311-3318","source":"Crossref","is-referenced-by-count":14,"title":["CLEP: a hybrid data- and knowledge-driven framework for generating patient representations"],"prefix":"10.1093","volume":"37","author":[{"given":"Vinay Srinivas","family":"Bharadhwaj","sequence":"first","affiliation":[{"name":"Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing , 53757 Sankt Augustin, Germany"},{"name":"Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn , 53115 Bonn, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1653-3920","authenticated-orcid":false,"given":"Mehdi","family":"Ali","sequence":"additional","affiliation":[{"name":"Rheinische Friedrich-Wilhelms-Universit\u00e4t Bonn , 53113 Bonn, Germany"},{"name":"Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) , Dresden and Sankt Augustin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Colin","family":"Birkenbihl","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing , 53757 Sankt Augustin, Germany"},{"name":"Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn , 53115 Bonn, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sarah","family":"Mubeen","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing , 53757 Sankt Augustin, Germany"},{"name":"Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn , 53115 Bonn, Germany"},{"name":"Fraunhofer Center for Machine Learning , Bonn, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jens","family":"Lehmann","sequence":"additional","affiliation":[{"name":"Rheinische Friedrich-Wilhelms-Universit\u00e4t Bonn , 53113 Bonn, Germany"},{"name":"Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) , Dresden and Sankt Augustin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"Hofmann-Apitius","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing , 53757 Sankt Augustin, Germany"},{"name":"Bonn-Aachen International Center for Information Technology (B-IT), University of Bonn , 53115 Bonn, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4423-4370","authenticated-orcid":false,"given":"Charles Tapley","family":"Hoyt","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing , 53757 Sankt Augustin, Germany"},{"name":"Rheinische Friedrich-Wilhelms-Universit\u00e4t Bonn , 53113 Bonn, Germany"},{"name":"Fraunhofer Center for Machine Learning , Bonn, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2046-6145","authenticated-orcid":false,"given":"Daniel","family":"Domingo-Fern\u00e1ndez","sequence":"additional","affiliation":[{"name":"Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing , 53757 Sankt Augustin, Germany"},{"name":"Rheinische Friedrich-Wilhelms-Universit\u00e4t Bonn , 53113 Bonn, Germany"},{"name":"Fraunhofer Center for Machine Learning , Bonn, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2021,5,8]]},"reference":[{"key":"2023051608282432900_btab340-B1","first-page":"1","article-title":"PyKEEN 1.0: a Python library for training and evaluating knowledge graph embeddings","volume":"22","author":"Ali","year":"2021","journal-title":"J. Mach. Learn. Res"},{"key":"2023051608282432900_btab340-B2","author":"Ali","year":"2020"},{"key":"2023051608282432900_btab340-B3","first-page":"2787","article-title":"Translating embeddings for modeling multi-relational data","author":"Bordes","year":"2013","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023051608282432900_btab340-B4","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1016\/j.ccell.2017.05.005","article-title":"Intertumoral heterogeneity within medulloblastoma subgroups","volume":"31","author":"Cavalli","year":"2017","journal-title":"Cancer Cell"},{"key":"2023051608282432900_btab340-B5","first-page":"785","author":"Chen","year":"2016"},{"key":"2023051608282432900_btab340-B6","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn"},{"key":"2023051608282432900_btab340-B7","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1186\/s12859-019-2863-9","article-title":"PathMe: merging and exploring mechanistic pathway knowledge","volume":"20","author":"Domingo-Fern\u00e1ndez","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023051608282432900_btab340-B8","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1093\/nsr\/nwt032","article-title":"Challenges of big data analysis","volume":"1","author":"Fan","year":"2014","journal-title":"Natl. Sci. Rev"},{"key":"2023051608282432900_btab340-B9","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1186\/s12916-018-1122-7","article-title":"From hype to reality: data science enabling personalized medicine","volume":"16","author":"Fr\u00f6hlich","year":"2018","journal-title":"BMC Medicine"},{"key":"2023051608282432900_btab340-B10","doi-asserted-by":"crossref","first-page":"100174","DOI":"10.1016\/j.bdr.2020.100174","article-title":"SMR: medical knowledge graph embedding for safe medicine recommendation","volume":"23","author":"Gong","year":"2021","journal-title":"Big Data Res"},{"key":"2023051608282432900_btab340-B11","first-page":"855","author":"Grover","year":"2016"},{"key":"2023051608282432900_btab340-B12","doi-asserted-by":"crossref","first-page":"e0200003","DOI":"10.1371\/journal.pone.0200003","article-title":"Inference of cell type content from human brain transcriptomic datasets illuminates the effects of age, manner of death, dissection, and psychiatric diagnosis","volume":"13","author":"Hagenauer","year":"2018","journal-title":"PLoS One"},{"key":"2023051608282432900_btab340-B13","first-page":"780","author":"Hanhij\u00e4rvi","year":"2009"},{"key":"2023051608282432900_btab340-B14","doi-asserted-by":"crossref","first-page":"e1004259","DOI":"10.1371\/journal.pcbi.1004259","article-title":"Heterogeneous network edge prediction: a data integration approach to prioritize disease-associated genes","volume":"11","author":"Himmelstein","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023051608282432900_btab340-B15","first-page":"278","author":"Ho","year":"1995"},{"key":"2023051608282432900_btab340-B16","doi-asserted-by":"crossref","first-page":"bax059","DOI":"10.1093\/database\/bax059","article-title":"BioSearch: a semantic search engine for Bio2RDF","volume":"2017","author":"Hu","year":"2017","journal-title":"Database"},{"key":"2023051608282432900_btab340-B17","first-page":"D498","article-title":"The reactome pathway knowledgebase","volume":"48","author":"Jassal","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023051608282432900_btab340-B18","doi-asserted-by":"crossref","first-page":"D353","DOI":"10.1093\/nar\/gkw1092","article-title":"KEGG: new perspectives on genomes, pathways, diseases and drugs","volume":"45","author":"Kanehisa","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051608282432900_btab340-B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-29433-3","article-title":"Using multi-scale genetic, neuroimaging and clinical data for predicting Alzheimer\u2019s disease and reconstruction of relevant biological mechanisms","volume":"8","author":"Khanna","year":"2018","journal-title":"Sci. Rep"},{"key":"2023051608282432900_btab340-B20","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1186\/1471-2105-9-559","article-title":"WGCNA: an R package for weighted correlation network analysis","volume":"9","author":"Langfelder","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023051608282432900_btab340-B21","doi-asserted-by":"crossref","first-page":"156663","DOI":"10.1109\/ACCESS.2020.3019577","article-title":"Patient similarity via joint embeddings of medical knowledge graph and medical entity descriptions","volume":"8","author":"Lin","year":"2020","journal-title":"IEEE Access"},{"key":"2023051608282432900_btab340-B22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s41512-020-00075-2","article-title":"Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: application to the discrimination between type 1 and type 2 diabetes in young adults","volume":"4","author":"Lynam","year":"2020","journal-title":"Diagn. Prognostic Res"},{"key":"2023051608282432900_btab340-B23","doi-asserted-by":"crossref","first-page":"3806","DOI":"10.1002\/1873-3468.13082","article-title":"The role of heparan sulfates in protein aggregation and their potential impact on neurodegeneration","volume":"592","author":"Ma\u00efza","year":"2018","journal-title":"FEBS Lett"},{"key":"2023051608282432900_btab340-B24","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/j.jalz.2005.06.003","article-title":"Ways toward an early diagnosis in Alzheimer\u2019s disease: the Alzheimer\u2019s Disease Neuroimaging Initiative (ADNI)","volume":"1","author":"Mueller","year":"2005","journal-title":"Alzheimer's Dementia"},{"key":"2023051608282432900_btab340-B25","article-title":"GuiltyTargets: prioritization of novel therapeutic targets with deep network representation learning","author":"Muslu","year":"2020","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinf"},{"key":"2023051608282432900_btab340-B26","author":"Nickel","year":"2016"},{"key":"2023051608282432900_btab340-B27","doi-asserted-by":"crossref","first-page":"D358","DOI":"10.1093\/nar\/gkt1115","article-title":"The MIntAct project\u2014IntAct as a common curation platform for 11 molecular interaction databases","volume":"42","author":"Orchard","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051608282432900_btab340-B28","doi-asserted-by":"crossref","first-page":"D529","DOI":"10.1093\/nar\/gky1079","article-title":"The BioGRID interaction database: 2019 update","volume":"47","author":"Oughtred","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023051608282432900_btab340-B29","doi-asserted-by":"crossref","first-page":"2924","DOI":"10.1016\/j.jmb.2018.05.037","article-title":"Patient similarity networks for precision medicine","volume":"430","author":"Pai","year":"2018","journal-title":"J. Mol. Biol"},{"key":"2023051608282432900_btab340-B30","doi-asserted-by":"crossref","first-page":"e8497","DOI":"10.15252\/msb.20188497","article-title":"netDx: interpretable patient classification using integrated patient similarity networks","volume":"15","author":"Pai","year":"2019","journal-title":"Mol. Syst. Biol"},{"key":"2023051608282432900_btab340-B31","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023051608282432900_btab340-B32","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/j.ccell.2017.07.007","article-title":"Integrated genomic characterization of pancreatic ductal adenocarcinoma","volume":"32","author":"Raphael","year":"2017","journal-title":"Cancer Cell"},{"key":"2023051608282432900_btab340-B33","first-page":"D489","article-title":"Pathway Commons 2019 Update: integration, analysis and exploration of pathway data","volume":"48","author":"Rodchenkov","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023051608282432900_btab340-B34","doi-asserted-by":"crossref","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision\u2013recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLoS One"},{"key":"2023051608282432900_btab340-B35","doi-asserted-by":"crossref","first-page":"792","DOI":"10.1016\/j.jalz.2015.05.009","article-title":"Genetic studies of quantitative MCI and AD phenotypes in ADNI: progress, opportunities, and plans","volume":"11","author":"Saykin","year":"2015","journal-title":"Alzheimer's Dementia"},{"key":"2023051608282432900_btab340-B36","doi-asserted-by":"crossref","first-page":"D661","DOI":"10.1093\/nar\/gkx1064","article-title":"WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research","volume":"46","author":"Slenter","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023051608282432900_btab340-B37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-020-3427-8","article-title":"Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data","volume":"21","author":"Smith","year":"2020","journal-title":"BMC Bioinformatics"},{"key":"2023051608282432900_btab340-B38","author":"Sun","year":"2019"},{"key":"2023051608282432900_btab340-B39","doi-asserted-by":"crossref","first-page":"3006","DOI":"10.1093\/brain\/awl249","article-title":"Role of toll-like receptor signalling in A\u03b2 uptake and clearance","volume":"129","author":"Tahara","year":"2006","journal-title":"Brain"},{"key":"2023051608282432900_btab340-B40","first-page":"1067","author":"Tang","year":"2015"},{"key":"2023051608282432900_btab340-B41","first-page":"2071","author":"Trouillon","year":"2016"},{"key":"2023051608282432900_btab340-B42","article-title":"Estrogen receptor beta (ESR2) gene polymorphism and susceptibility to dementia","author":"Ulhaq","year":"2020","journal-title":"Acta Neurol. Belgica"},{"key":"2023051608282432900_btab340-B43","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1159\/000110455","article-title":"Role of the toll-like receptor 4 in neuroinflammation in Alzheimer's disease","volume":"20","author":"Walter","year":"2007","journal-title":"Cell Physiol. Biochem"},{"key":"2023051608282432900_btab340-B44","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1038\/nmeth.2810","article-title":"Similarity network fusion for aggregating data types on a genomic scale","volume":"11","author":"Wang","year":"2014","journal-title":"Nat. Methods"},{"key":"2023051608282432900_btab340-B45","first-page":"1112","article-title":"Knowledge graph embedding by translating on hyperplanes","volume":"14","author":"Wang","year":"2014","journal-title":"AAAI"},{"key":"2023051608282432900_btab340-B46","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1007\/s41048-019-0086-2","article-title":"Identification of key genes and pathways for Alzheimer\u2019s disease via combined analysis of genome-wide expression profiling in the hippocampus","volume":"5","author":"Wu","year":"2019","journal-title":"Biophys. Rep"},{"key":"2023051608282432900_btab340-B47","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1186\/s13059-019-1689-0","article-title":"Machine learning and complex biological data","volume":"20","author":"Xu","year":"2019","journal-title":"Genome Biol"},{"key":"2023051608282432900_btab340-B48","doi-asserted-by":"crossref","first-page":"200","DOI":"10.5808\/GI.2013.11.4.200","article-title":"Review of biological network data and its applications","volume":"11","author":"Yu","year":"2013","journal-title":"Genomics Inf"},{"key":"2023051608282432900_btab340-B49","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1016\/j.arr.2015.08.001","article-title":"Estrogen receptor \u03b2 in Alzheimer\u2019s disease: from mechanisms to therapeutics","volume":"24","author":"Zhao","year":"2015","journal-title":"Ageing Res. Rev"},{"key":"2023051608282432900_btab340-B50","doi-asserted-by":"crossref","first-page":"i457","DOI":"10.1093\/bioinformatics\/bty294","article-title":"Modeling polypharmacy side effects with graph convolutional networks","volume":"34","author":"Zitnik","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051608282432900_btab340-B51","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/j.inffus.2018.09.012","article-title":"Machine learning for integrating data in biology and medicine: principles, practice, and opportunities","volume":"50","author":"Zitnik","year":"2019","journal-title":"Inf. Fusion"},{"key":"2023051608282432900_btab340-B52","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1111\/j.1467-9868.2005.00503.x","article-title":"Regularization and variable selection via the elastic net","volume":"67","author":"Zou","year":"2005","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab340\/38603291\/btab340.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3311\/50338610\/btab340.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3311\/50338610\/btab340.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T04:44:28Z","timestamp":1684212268000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/19\/3311\/6272574"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2021,5,8]]},"references-count":52,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2021,10,11]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab340","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.08.20.259226","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,10,1]]},"published":{"date-parts":[[2021,5,8]]}}}