{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T21:17:34Z","timestamp":1774387054644,"version":"3.50.1"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T00:00:00Z","timestamp":1762128000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T00:00:00Z","timestamp":1762128000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Albert-Ludwigs-Universit\u00e4t Freiburg im Breisgau"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J CARS"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>\n                      <jats:bold>Purpose:<\/jats:bold>\n                    <\/jats:title>\n                    <jats:p>Bayesian networks (BNs) are valuable for clinical decision support due to their transparency and interpretability. However, BN modelling requires considerable manual effort. This study explores how integrating large language models (LLMs) with retrieval-augmented generation (RAG) can improve BN modelling by increasing efficiency, reducing cognitive workload, and ensuring accuracy.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>\n                      <jats:bold>Methods:<\/jats:bold>\n                    <\/jats:title>\n                    <jats:p>We developed a web-based BN modelling service that integrates an LLM-RAG pipeline. A fine-tuned GTE-Large embedding model was employed for knowledge retrieval, optimised through recursive chunking and query expansion. To ensure accurate BN suggestions, we defined a causal structure for medical idioms by unifying existing BN frameworks. GPT-4 and Mixtral 8x7B were used to handle complex data interpretation and to generate modelling suggestions, respectively. A user study with four clinicians assessed usability, retrieval accuracy, and cognitive workload using NASA-TLX. The study demonstrated the system\u2019s potential for efficient and clinically relevant BN modelling.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>\n                      <jats:bold>Results:<\/jats:bold>\n                    <\/jats:title>\n                    <jats:p>\n                      The RAG pipeline improved retrieval accuracy and answer relevance. Recursive chunking with the fine-tuned embedding model GTE-Large achieved the highest retrieval accuracy score\u00a0(0.9). Query expansion and Hyde optimisation enhanced retrieval accuracy for semantic chunking\u00a0(0.75 to 0.85). Responses maintained high faithfulness\u00a0(\n                      <jats:inline-formula>\n                        <jats:alternatives>\n                          <jats:tex-math>$$\\ge $$<\/jats:tex-math>\n                          <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                            <mml:mo>\u2265<\/mml:mo>\n                          <\/mml:math>\n                        <\/jats:alternatives>\n                      <\/jats:inline-formula>\n                      0.9). However, the LLM occasionally failed to adhere to predefined causal structures and medical idioms. All clinicians, regardless of BN experience, created comprehensive models within one hour. Experienced clinicians produced more complex models, but occasionally introduced causality errors, while less experienced users adhered more accurately to predefined structures. The tool reduced cognitive workload\u00a0(2\/7 NASA-TLX) and was described as intuitive, although workflow interruptions and minor technical issues highlighted areas for improvement.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>\n                      <jats:bold>Conclusion:<\/jats:bold>\n                    <\/jats:title>\n                    <jats:p>Integrating LLM-RAG into BN modelling enhances efficiency and accuracy. Future work may focus on automated preprocessing, refinements of the user interface, and extending the RAG pipeline with validation steps and external biomedical sources. Generative AI holds promise for expert-driven knowledge modelling.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1007\/s11548-025-03524-9","type":"journal-article","created":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T12:01:12Z","timestamp":1762171272000},"page":"211-222","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Large language models with retrieval-augmented generation enhance expert modelling of Bayesian network for clinical decision support"],"prefix":"10.1007","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2944-9357","authenticated-orcid":false,"given":"Mario A.","family":"Cypko","sequence":"first","affiliation":[]},{"given":"Muhammad Agus","family":"Salim","sequence":"additional","affiliation":[]},{"given":"Aditya","family":"Kumar","sequence":"additional","affiliation":[]},{"given":"Leonard","family":"Berliner","sequence":"additional","affiliation":[]},{"given":"Andreas","family":"Dietz","sequence":"additional","affiliation":[]},{"given":"Matthaeus","family":"Stoehr","sequence":"additional","affiliation":[]},{"given":"Oliver","family":"Amft","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,11,3]]},"reference":[{"issue":"9","key":"3524_CR1","doi-asserted-by":"publisher","first-page":"1513","DOI":"10.1007\/s11548-022-02731-y","volume":"17","author":"HU Lemke","year":"2022","unstructured":"Lemke HU (2022) Moving from data, information, knowledge and models to wisdom-based decision making in the domain of computer assisted radiology and surgery (cars). Int J Comput Assist Radiol Surg 17(9):1513\u20131517","journal-title":"Int J Comput Assist Radiol Surg"},{"key":"3524_CR2","first-page":"1","volume":"9","author":"MA Cypko","year":"2024","unstructured":"Cypko MA, Wilhelm D (2024) Ladies and gentlemen! this is no humbug. why model-guided medicine will become a main pillar for the future healthcare system. Int J Comput Assisted Radiol Sur 9:1\u20139","journal-title":"Int J Comput Assisted Radiol Sur"},{"key":"3524_CR3","unstructured":"Pearl J (1985) Bayesian networks: A model cf self-activated memory for evidential reasoning. In: Proceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA, USA, pp. 15\u201317"},{"issue":"5","key":"3524_CR4","doi-asserted-by":"publisher","first-page":"117","DOI":"10.3390\/jimaging10050117","volume":"10","author":"G Nicora","year":"2024","unstructured":"Nicora G, Catalano M, Bortolotto C, Achilli MF, Messana G, Lo Tito A, Consonni A, Cutti S, Comotto F, Stella GM (2024) Bayesian networks in the management of hospital admissions: a comparison between explainable ai and black box ai during the pandemic. J Imag 10(5):117","journal-title":"J Imag"},{"key":"3524_CR5","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2020.101912","volume":"107","author":"S McLachlan","year":"2020","unstructured":"McLachlan S, Dube K, Hitman GA, Fenton NE, Kyrimi E (2020) Bayesian networks in healthcare: distribution by medical condition. Artif Intell Med 107:101912","journal-title":"Artif Intell Med"},{"issue":"8","key":"3524_CR6","doi-asserted-by":"publisher","first-page":"8721","DOI":"10.1007\/s10462-022-10351-w","volume":"56","author":"NK Kitson","year":"2023","unstructured":"Kitson NK, Constantinou AC, Guo Z, Liu Y, Chobtham K (2023) A survey of bayesian network structure learning. Artif Intell Rev 56(8):8721\u20138814","journal-title":"Artif Intell Rev"},{"key":"3524_CR7","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1016\/j.artmed.2016.01.002","volume":"67","author":"AC Constantinou","year":"2016","unstructured":"Constantinou AC, Fenton N, Marsh W, Radlinski L (2016) From complex questionnaire and interviewing data to intelligent bayesian network models for medical decision support. Artif Intell Med 67:75\u201393","journal-title":"Artif Intell Med"},{"key":"3524_CR8","doi-asserted-by":"crossref","unstructured":"Spirtes P, Glymour C, Scheines R, Spirtes P, Glymour C, Scheines R (1993) Discovery algorithms for causally sufficient structures. Causation, prediction, and search, 103\u2013162","DOI":"10.1007\/978-1-4612-2748-9_5"},{"key":"3524_CR9","unstructured":"BayesFusion L (2017) Genie modeler - User Manual. Accessed: 09.01.2025. https:\/\/support.bayesfusion.com\/docs\/"},{"key":"3524_CR10","first-page":"842","volume":"2","author":"A Onisko","year":"1999","unstructured":"Onisko A, Druzdzel MJ, Wasyluk H (1999) A bayesian network model for diagnosis of liver disorders. Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering 2:842\u2013846","journal-title":"Proceedings of the Eleventh Conference on Biocybernetics and Biomedical Engineering"},{"issue":"11","key":"3524_CR11","doi-asserted-by":"publisher","first-page":"1959","DOI":"10.1007\/s11548-017-1531-7","volume":"12","author":"MA Cypko","year":"2017","unstructured":"Cypko MA, Stoehr M, Kozniewski M, Druzdzel MJ, Dietz A, Berliner L, Lemke HU (2017) Validation workflow for a clinical bayesian network model in multidisciplinary decision making in head and neck oncology treatment. Int J Comput Assist Radiol Surg 12(11):1959\u20131970","journal-title":"Int J Comput Assist Radiol Surg"},{"key":"3524_CR12","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1609\/aaaiss.v4i1.31764","volume":"4","author":"S Ashwani","year":"2024","unstructured":"Ashwani S, Hegde K, Mannuru NR, Sengar DS, Jindal M, Kathala KCR, Banga D, Jain V, Chadha A (2024) Cause and effect: can large language models truly understand causality? Proceedings of the AAAI Symposium Series 4:2\u20139","journal-title":"Proceedings of the AAAI Symposium Series"},{"key":"3524_CR13","doi-asserted-by":"crossref","unstructured":"Wang J, Cao D, Lu S, Ma Z, Xiao J, Chua T-S (2024) Causal-driven large language models with faithful reasoning for knowledge question answering. In: Proceedings of the 32nd ACM International Conference on Multimedia, pp. 4331\u20134340","DOI":"10.1145\/3664647.3681263"},{"key":"3524_CR14","doi-asserted-by":"crossref","unstructured":"Zhou S, Lin M, Ding S, Wang J, Chen C, Melton GB, Zou J, Zhang R (2025) Explainable differential diagnosis with dual-inference large language models. npj Health Syst. 2(1):12","DOI":"10.1038\/s44401-025-00015-6"},{"issue":"1","key":"3524_CR15","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ada47f","volume":"6","author":"K-H Cohrs","year":"2025","unstructured":"Cohrs K-H, Diaz E, Sitokonstantinou V, Varando G, Camps-Valls G (2025) Large language models for causal hypothesis generation in science. Mach Learn Sci Technol 6(1):013001","journal-title":"Mach Learn Sci Technol"},{"key":"3524_CR16","first-page":"9459","volume":"33","author":"P Lewis","year":"2020","unstructured":"Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, K\u00fcttler H, Lewis M, Yih W-T, Rockt\u00e4schel T (2020) Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv Neural Inf Process Syst 33:9459\u20139474","journal-title":"Adv Neural Inf Process Syst"},{"key":"3524_CR17","doi-asserted-by":"crossref","unstructured":"Xia P, Zhu K, Li, H, Wang T, Shi W, Wang S, Zhang L, Zou J, Yao H (2024) Mmed-rag: Versatile multimodal rag system for medical vision language models. arXiv preprint arXiv:2410.13085","DOI":"10.18653\/v1\/2024.emnlp-main.62"},{"key":"3524_CR18","unstructured":"OpenAI: GPT-4 - Openai platform (2024). https:\/\/platform.openai.com\/"},{"key":"3524_CR19","unstructured":"Jiang AQ, Sablayrolles A, Roux A, Mensch A, Savary B, Bamford C, Chaplot DS, Casas Ddl, Hanna EB, Bressand F et al (2024) Mixtral of experts. arXiv preprint arXiv:2401.04088"},{"key":"3524_CR20","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2020.103495","volume":"108","author":"E Kyrimi","year":"2020","unstructured":"Kyrimi E, Neves MR, McLachlan S, Neil M, Marsh W, Fenton N (2020) Medical idioms for clinical bayesian network development. J Biomed Inform 108:103495","journal-title":"J Biomed Inform"},{"key":"3524_CR21","unstructured":"Network NCC (2024) Guidelines for Treatment of Cancer by Type, NCCN, Recently Updated Guideline. Accessed: 01.11.2024. https:\/\/www.nccn.org\/guidelines\/recently-published-guidelines"},{"issue":"11","key":"3524_CR22","doi-asserted-by":"publisher","first-page":"1052","DOI":"10.1007\/s00117-020-00760-9","volume":"60","author":"F Bootz","year":"2020","unstructured":"Bootz F (2020) S3-leitlinie diagnostik, therapie und nachsorge des larynxkarzinoms. Radiologe 60(11):1052\u20131057","journal-title":"Radiologe"},{"key":"3524_CR23","unstructured":"Surgeons AC (2024) AJCC 9th version of cancer staging system. Accessed: 01.10.2024. https:\/\/www.facs.org\/quality-programs\/cancer-programs\/american-joint-committee-on-cancer\/version-9\/"},{"key":"3524_CR24","doi-asserted-by":"crossref","unstructured":"Pan L, Yao H, Li Z, Ren Y (2022) A code error correction system for pdf documents using regex and similarity matching. In: 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), vol. 10, pp. 2436\u20132440. IEEE","DOI":"10.1109\/ITAIC54216.2022.9836781"},{"key":"3524_CR25","unstructured":"Artifex: Pymupdf4llm API. Accessed: 09.01.2025 (2024). https:\/\/pymupdf.readthedocs.io\/en\/latest\/pymupdf4llm\/"},{"key":"3524_CR26","doi-asserted-by":"crossref","unstructured":"Gao L, Ma X, Lin J, Callan J (2023) Precise zero-shot dense retrieval without relevance labels. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1762\u20131777","DOI":"10.18653\/v1\/2023.acl-long.99"},{"key":"3524_CR27","unstructured":"Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597\u20131607. PMLR"},{"key":"3524_CR28","doi-asserted-by":"crossref","unstructured":"Yu H, Gan A, Zhang K, Tong S, Liu Q, Liu Z (2024) Evaluation of retrieval-augmented generation: A survey. arXiv preprint arXiv:2405.07437","DOI":"10.1007\/978-981-96-1024-2_8"},{"key":"3524_CR29","doi-asserted-by":"crossref","unstructured":"Br\u00e5dland H, Goodwin M, Andersen P-A, Nossum AS, Gupta A (2025) A new hope: Domain-agnostic automatic evaluation of text chunking. In: Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 170\u2013179","DOI":"10.1145\/3726302.3729882"},{"key":"3524_CR30","doi-asserted-by":"publisher","first-page":"17754","DOI":"10.1609\/aaai.v38i16.29728","volume":"38","author":"J Chen","year":"2024","unstructured":"Chen J, Lin H, Han X, Sun L (2024) Benchmarking large language models in retrieval-augmented generation. Proceedings of the AAAI Conference on Artificial Intelligence 38:17754\u201317762","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"3524_CR31","doi-asserted-by":"crossref","unstructured":"Kukreja S, Kumar T, Bharate V, Purohit A, Dasgupta A, Guha D (2024) Performance evaluation of vector embeddings with retrieval-augmented generation. In: 2024 9th International Conference on Computer and Communication Systems (ICCCS), pp. 333\u2013340. IEEE","DOI":"10.1109\/ICCCS61882.2024.10603291"},{"key":"3524_CR32","doi-asserted-by":"crossref","unstructured":"Zhang W, Liu Z, Wang K, Lian S (2024) Query expansion and verification with large language model for information retrieval. In: International Conference on Intelligent Computing, pp. 341\u2013351. Springer","DOI":"10.1007\/978-981-97-5672-8_29"},{"key":"3524_CR33","doi-asserted-by":"crossref","unstructured":"Hart SG (2006) Nasa-task load index (nasa-tlx); 20 years later. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 50, pp. 904\u2013908. Sage publications Sage CA: Los Angeles, CA","DOI":"10.1177\/154193120605000909"},{"key":"3524_CR34","doi-asserted-by":"crossref","unstructured":"Nan H, Marx M, Wolswinkel J (2024) Combining rule-based and machine learning methods for efficient information extraction from enforcement decisions. Legal Knowledge and Information Systems, 321\u2013326","DOI":"10.3233\/FAIA241262"},{"key":"3524_CR35","first-page":"46595","volume":"36","author":"L Zheng","year":"2023","unstructured":"Zheng L, Chiang W-L, Sheng Y, Zhuang S, Wu Z, Zhuang Y, Lin Z, Li Z, Li D, Xing E (2023) Judging llm-as-a-judge with mt-bench and chatbot arena. Adv Neural Inf Process Syst 36:46595\u201346623","journal-title":"Adv Neural Inf Process Syst"},{"key":"3524_CR36","doi-asserted-by":"crossref","unstructured":"Papadopoulos P, Soflano M, Chaudy Y, Adejo W, Connolly TM (2022) A systematic review of technologies and standards used in the development of rule-based clinical decision support systems. Heal Technol 12(4):713\u2013727","DOI":"10.1007\/s12553-022-00672-9"}],"container-title":["International Journal of Computer Assisted Radiology and Surgery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-025-03524-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11548-025-03524-9","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11548-025-03524-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T18:02:43Z","timestamp":1774375363000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11548-025-03524-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,3]]},"references-count":36,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["3524"],"URL":"https:\/\/doi.org\/10.1007\/s11548-025-03524-9","relation":{},"ISSN":["1861-6429"],"issn-type":[{"value":"1861-6429","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,3]]},"assertion":[{"value":"21 January 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 September 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 November 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors Mario Cypko, Muhammad Agus Salim, Aditya Kumar, Leonard Berliner, Andreas Dietz, Matthaeus Stoehr, and Oliver Amft declare that they have no conflict of interest related to this work.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This study involved usability testing of a web-based Bayesian network modelling tool with four clinicians from the University Hospital Leipzig. The research did not involve patient data or clinical interventions. Based on institutional and national regulations, ethical approval was not required as the study focused solely on tool usability and interaction data. Informed consent was obtained from all individual participants included in the study. Participants were clinicians from the University Hospital Leipzig, who voluntarily agreed to take part in the usability study. Detailed consent forms were provided and signed, outlining the scope of data collection, the use of recordings, and publication-related aspects.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed Consent"}}]}}