{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T09:29:13Z","timestamp":1761643753989,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"type":"electronic","value":"9781643686158"}],"license":[{"start":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T00:00:00Z","timestamp":1756857600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9,3]]},"abstract":"<jats:p>Introduction: The integration of Retrieval-Augmented Generation (RAG) into domain-specific systems enables context-aware and traceable information retrieval. This study explores chunking and embedding strategies for a RAG-based question-answering system tailored to administrative documents at University Hospital Halle, focusing on model selection, parameter tuning, and retrieval performance. The insights gained from this study should serve as the foundation for the future development of a Retrieval-Augmented Generation (RAG) based chatbot system that aims to facilitate access to document pool contents for hospital staff. Methods: A corpus of 1,219 documents was preprocessed and chunked using varied parameters, including soft\/hard character limits and overlaps. Eight embedding models were evaluated using Similarity Score and Maximum Marginal Relevance (MMR) retrievers. Top models Jinaai-v3 and Aari1995 were further analyzed across eight parameter configurations and ensemble retrievers using weight (w) and context (c) parameters. Results: Aari1995 reached the highest Top10 score (92.3%) with stable performance across chunk sizes and retriever configurations. Jinaai-v3 showed slightly stronger Top5 (84.6%) and Top3 (76.9%) scores but with greater sensitivity to parameter variations. Ensemble retrievers improved retrieval quality for both models, particularly when tuned via w-values. The c-parameter showed negligible influence. Runtime evaluation revealed that Jinaai-v3 generated vector stores more than four times faster than Aari1995. Overall, the similarity score retriever consistently outperformed MMR, both standalone and in ensemble configurations. Conclusion: Chunking and embedding choices significantly affect retrieval in domain-specific RAG systems. While both Jinaai-v3 and Aari1995 were effective, they differed in stability, accuracy, and efficiency. Findings support deploying a locally executable RAG system for administrative use, guiding future optimization of chunking and parameter robustness.<\/jats:p>","DOI":"10.3233\/shti251383","type":"book-chapter","created":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T09:24:10Z","timestamp":1761643450000},"source":"Crossref","is-referenced-by-count":0,"title":["Evaluation of Chunking and Embedding Strategies for Local Document Retrieval Using an Open-Source LLM in a Hospital"],"prefix":"10.3233","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-7946-8559","authenticated-orcid":false,"given":"Jan","family":"Bossenz","sequence":"first","affiliation":[{"name":"Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany"}]},{"given":"Carlo","family":"G\u00fcnzl","sequence":"additional","affiliation":[{"name":"Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany"}]},{"given":"Fabian","family":"Berns","sequence":"additional","affiliation":[{"name":"medicalvalues GmbH, Karlsruhe, Germany"}]},{"given":"Annemarie","family":"Weise","sequence":"additional","affiliation":[{"name":"Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany"}]},{"given":"Christian","family":"J\u00e4ger","sequence":"additional","affiliation":[{"name":"Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany"}]},{"given":"Jan","family":"Kirchhoff","sequence":"additional","affiliation":[{"name":"medicalvalues GmbH, Karlsruhe, Germany"}]},{"given":"Jan","family":"Christoph","sequence":"additional","affiliation":[{"name":"Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4319-3175","authenticated-orcid":false,"given":"Christoph","family":"Demus","sequence":"additional","affiliation":[{"name":"Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany"}]}],"member":"7437","container-title":["Studies in Health Technology and Informatics","German Medical Data Sciences 2025: GMDS Illuminates Health"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/SHTI251383","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T09:24:10Z","timestamp":1761643450000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/SHTI251383"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,3]]},"ISBN":["9781643686158"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/shti251383","relation":{},"ISSN":["0926-9630","1879-8365"],"issn-type":[{"type":"print","value":"0926-9630"},{"type":"electronic","value":"1879-8365"}],"subject":[],"published":{"date-parts":[[2025,9,3]]}}}