{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,2]],"date-time":"2026-02-02T13:53:35Z","timestamp":1770040415092,"version":"3.49.0"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"S6","license":[{"start":{"date-parts":[[2023,3,6]],"date-time":"2023-03-06T00:00:00Z","timestamp":1678060800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,6]],"date-time":"2023-03-06T00:00:00Z","timestamp":1678060800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["SA 465\/53-1"],"award-info":[{"award-number":["SA 465\/53-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["HE 8077\/2-1"],"award-info":[{"award-number":["HE 8077\/2-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Graph databases enable efficient storage of heterogeneous, highly-interlinked data, such as clinical data. Subsequently, researchers can extract relevant features from these datasets and apply machine learning for diagnosis, biomarker discovery, or understanding pathogenesis.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Methods<\/jats:title>\n                <jats:p>To facilitate machine learning and save time for extracting data from the graph database, we developed and optimized Decision Tree Plug-in (DTP) containing 24 procedures to generate and evaluate decision trees directly in the graph database Neo4j on homogeneous and unconnected nodes.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>Creation of the decision tree for three clinical datasets directly in the graph database from the nodes required between 0.059 and 0.099\u00a0s, while calculating the decision tree with the same algorithm in Java from CSV files took 0.085\u20130.112\u00a0s. Furthermore, our approach was faster than the standard decision tree implementations in R (0.62 s) and equal to Python (0.08 s), also using CSV files as input for small datasets. In addition, we have explored the strengths of DTP by evaluating a large dataset (approx. 250,000 instances) to predict patients with diabetes and compared the performance against algorithms generated by state-of-the-art packages in R and Python. By doing so, we have been able to show competitive results on the performance of Neo4j, in terms of quality of predictions as well as time efficiency. Furthermore, we could show that high body-mass index and high blood pressure are the main risk factors for diabetes.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>Overall, our work shows that integrating machine learning into graph databases saves time for additional processes as well as external memory, and could be applied to a variety of use cases, including clinical applications. This provides user with the advantages of high scalability, visualization and complex querying.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12911-023-02112-8","type":"journal-article","created":{"date-parts":[[2023,3,6]],"date-time":"2023-03-06T17:03:18Z","timestamp":1678122198000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Decision tree learning in Neo4j on homogeneous and unconnected graph nodes from biological and clinical datasets"],"prefix":"10.1186","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7673-9811","authenticated-orcid":false,"given":"Rahul","family":"Mondal","sequence":"first","affiliation":[]},{"given":"Minh Dung","family":"Do","sequence":"additional","affiliation":[]},{"given":"Nasim Uddin","family":"Ahmed","sequence":"additional","affiliation":[]},{"given":"Daniel","family":"Walke","sequence":"additional","affiliation":[]},{"given":"Daniel","family":"Micheel","sequence":"additional","affiliation":[]},{"given":"David","family":"Broneske","sequence":"additional","affiliation":[]},{"given":"Gunter","family":"Saake","sequence":"additional","affiliation":[]},{"given":"Robert","family":"Heyer","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,3,6]]},"reference":[{"key":"2112_CR1","doi-asserted-by":"publisher","unstructured":"Santos A, Cola\u00e7o AR, Nielsen AB, Niu L, Geyer PE, Coscia F, Albrechtsen NJW, Mundt F, Jensen LJ, Mann M. Clinical knowledge graph integrates proteomics data into clinical decision-making. bioRxiv 2020; https:\/\/doi.org\/10.1101\/2020.05.09.084897.","DOI":"10.1101\/2020.05.09.084897"},{"key":"2112_CR2","doi-asserted-by":"publisher","DOI":"10.1186\/s12864-019-6413-7","author":"D Chicco","year":"2020","unstructured":"Chicco D, Jurman G. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genomics. 2020. https:\/\/doi.org\/10.1186\/s12864-019-6413-7.","journal-title":"BMC Genomics"},{"issue":"17","key":"2112_CR3","doi-asserted-by":"publisher","first-page":"42","DOI":"10.5120\/ijca2018916410","volume":"180","author":"T Aziz","year":"2018","unstructured":"Aziz T, Haq E-U, Muhammad D. Performance based comparison between RDBMS and OODBMS. Int J Comput Appl. 2018;180(17):42\u20136. https:\/\/doi.org\/10.5120\/ijca2018916410.","journal-title":"Int J Comput Appl"},{"key":"2112_CR4","doi-asserted-by":"publisher","unstructured":"Vicknair C, Macias M, Zhao Z, Nan X, Chen Y, Wilkins D. A comparison of a graph database and a relational database. ACM Press, 2010; https:\/\/doi.org\/10.1145\/1900008.1900067.","DOI":"10.1145\/1900008.1900067"},{"key":"2112_CR5","doi-asserted-by":"crossref","unstructured":"Pokorn J. Graph databases: their power and limitations 2015.","DOI":"10.1007\/978-3-319-24369-6_5"},{"key":"2112_CR6","unstructured":"Marzi M.D. Dynamic rule based decision trees in Neo4j 2018."},{"key":"2112_CR7","unstructured":"Neo4j: User-defined Procedures. https:\/\/neo4j.com\/docs\/java-reference\/current\/extending-neo4j\/procedures-and-functions\/procedures\/."},{"key":"2112_CR8","unstructured":"Michael Hunger R.B, Lyon W. RDBMS and Graphs: SQL vs. Cypher Query Languages 2016."},{"key":"2112_CR9","doi-asserted-by":"publisher","unstructured":"Fernandes D, Bernardino J. Graph databases comparison: Allegrograph, arangodb, infinitegraph, neo4j, and orientdb. In: Proceedings of the 7th international conference on data science, technology and applications. DATA 2018, pp. 373\u2013380. SCITEPRESS\u2014Science and Technology Publications, Lda, 2018; https:\/\/doi.org\/10.5220\/0006910203730380.","DOI":"10.5220\/0006910203730380"},{"issue":"1","key":"2112_CR10","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1007\/s11042-021-10990-1","volume":"81","author":"I Kalamaras","year":"2022","unstructured":"Kalamaras I, Glykos K, Megalooikonomou V, Votis K, Tzovaras D. Graph-based visualization of sensitive medical data. Multimedia Tools Appl. 2022;81(1):209\u201336. https:\/\/doi.org\/10.1007\/s11042-021-10990-1.","journal-title":"Multimedia Tools Appl"},{"key":"2112_CR11","first-page":"74","volume":"6","author":"H Patel","year":"2018","unstructured":"Patel H, Prajapati P. Study and analysis of decision tree based classification algorithms. Int J Comput Sci Eng. 2018;6:74\u20138.","journal-title":"Int J Comput Sci Eng"},{"key":"2112_CR12","unstructured":"Breiman L, Friedman J, Olshen R, Stone C. Cart: classification and regression trees (1984). Belmont, CA: Wadsworth; 1993."},{"key":"2112_CR13","doi-asserted-by":"crossref","unstructured":"Quinlan JR. Induction of decision trees. Machine Learning. 1986;1.","DOI":"10.1007\/BF00116251"},{"key":"2112_CR14","unstructured":"Quinlan J.R. Programs for machine learning, 1993."},{"key":"2112_CR15","doi-asserted-by":"crossref","unstructured":"Bramer M. Pre-pruning classification trees to reduce overfitting in noisy domains. In: Yin H, Allinson N, Freeman R, Keane J, Hubbard S editors Intelligent data engineering and automated learning\u2014IDEAL 2002, pp. 7\u201312. Springer, 2002.","DOI":"10.1007\/3-540-45675-9_2"},{"issue":"1","key":"2112_CR16","doi-asserted-by":"publisher","first-page":"81","DOI":"10.2337\/dc14-S081","volume":"37","author":"AD Association","year":"2013","unstructured":"Association AD. Diagnosis and classification of diabetes mellitus. Diabetes Care. 2013;37(1):81\u201390. https:\/\/doi.org\/10.2337\/dc14-S081.","journal-title":"Diabetes Care"},{"issue":"4","key":"2112_CR17","doi-asserted-by":"publisher","first-page":"380","DOI":"10.1016\/j.amjms.2016.01.011","volume":"351","author":"R Chen","year":"2016","unstructured":"Chen R, Ovbiagele B, Feng W. Diabetes and stroke: epidemiology, pathophysiology, pharmaceuticals and outcomes. Am J Med Sci. 2016;351(4):380\u20136. https:\/\/doi.org\/10.1016\/j.amjms.2016.01.011.","journal-title":"Am J Med Sci"},{"key":"2112_CR18","unstructured":"8 Databases supporting in-database machine learning. https:\/\/www.infoworld.com\/article\/3607762\/8-databases-supporting-in-database-machine-learning.html."},{"key":"2112_CR19","unstructured":"Dynamic Rule Based Decision Trees in Neo4j. https:\/\/maxdemarzi.com\/2018\/01\/14\/dynamic-rule-based-decision-trees-in-neo4j."},{"key":"2112_CR20","unstructured":"Neo4j Machine Learning Procedures. https:\/\/github.com\/neo4j-contrib\/neo4j-ml-procedures."},{"key":"2112_CR21","doi-asserted-by":"crossref","unstructured":"Anjana S, Lavanya K. An application of cypher query-based dynamic rule-based decision tree over suicide statistics dataset with neo4j. In: Intelligent IoT systems in personalized health care, pp. 293\u2013313 2021.","DOI":"10.1016\/B978-0-12-821187-8.00010-1"},{"issue":"1","key":"2112_CR22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12911-019-1002-x","volume":"20","author":"D Chicco","year":"2020","unstructured":"Chicco D, Jurman G. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Med Inf Decis Mak. 2020;20(1):1\u201316.","journal-title":"BMC Med Inf Decis Mak"},{"key":"2112_CR23","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1016\/j.jprot.2019.04.009","volume":"201","author":"T Lehmann","year":"2019","unstructured":"Lehmann T, Schallert K, Vilchez-Vargas R, Benndorf D, et al. Metaproteomics of fecal samples of crohn\u2019s disease and ulcerative colitis. J Proteomics. 2019;201:93\u2013103.","journal-title":"J Proteomics"},{"key":"2112_CR24","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-020-01266-z","author":"W Li","year":"2020","unstructured":"Li W, Ma J, Shende N, et al. Using machine learning of clinical data to diagnose covid-19: a systematic review and meta-analysis. BMC Med Inf Decis Mak. 2020. https:\/\/doi.org\/10.1186\/s12911-020-01266-z.","journal-title":"BMC Med Inf Decis Mak"},{"key":"2112_CR25","unstructured":"Diabetes Health Indicators Dataset. https:\/\/www.kaggle.com\/alexteboul\/diabetes-health-indicators-dataset."},{"key":"2112_CR26","unstructured":"Behavioral Risk Factor Surveillance System. https:\/\/www.kaggle.com\/cdc\/behavioral-risk-factor-surveillance-system."},{"issue":"10","key":"2112_CR27","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1109\/2.876288","volume":"33","author":"L Prechelt","year":"2000","unstructured":"Prechelt L. An empirical comparison of seven programming languages. Computer. 2000;33(10):23\u20139. https:\/\/doi.org\/10.1109\/2.876288.","journal-title":"Computer"},{"key":"2112_CR28","doi-asserted-by":"crossref","unstructured":"Sobhgol S, Durand G, L, R, Saake G. Machine learning within a graph database: A case study on link prediction for scholarly data. In: International conference on enterprise information systems, pp. 159\u2013166 2021.","DOI":"10.5220\/0010381901590166"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02112-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-023-02112-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-023-02112-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T16:06:52Z","timestamp":1704902812000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-023-02112-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,6]]},"references-count":28,"journal-issue":{"issue":"S6","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["2112"],"URL":"https:\/\/doi.org\/10.1186\/s12911-023-02112-8","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,6]]},"assertion":[{"value":"4 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 January 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 March 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable since this is a study involving the analysis of secondary data made publicly available online. Any approval from ethics and research committee involving human beings was not necessary, as there is no possibility of identifying any patient from the datasets.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"347"}}