{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T17:09:38Z","timestamp":1754154578645,"version":"3.41.2"},"reference-count":74,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T00:00:00Z","timestamp":1753315200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T00:00:00Z","timestamp":1753315200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["FKZ 01ZZ2106A","FKZ 01ZZ2106A"],"award-info":[{"award-number":["FKZ 01ZZ2106A","FKZ 01ZZ2106A"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100014584","name":"Universit\u00e4tsmedizin der Johannes Gutenberg-Universit\u00e4t Mainz","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100014584","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BioData Mining"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Tumor documentation in Germany is currently a largely manual process. It involves reading the textual patient documentation and filling in forms in dedicated databases to obtain structured data. Advances in information extraction techniques that build on large language models (LLMs) could have the potential for enhancing the efficiency and reliability of this process. Evaluating LLMs in the German medical domain, especially their ability to interpret specialized language, is essential to determine their suitability for the use in clinical documentation. Due to data protection regulations, only locally deployed open source LLMs are generally suitable for this application.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Methods<\/jats:title>\n            <jats:p>The evaluation employs eleven different open source LLMs with sizes ranging from 1 to 70 billion model parameters. Three basic tasks were selected as representative examples for the tumor documentation process: identifying tumor diagnoses, assigning ICD-10 codes, and extracting the date of first diagnosis. For evaluating the LLMs on these tasks, a dataset of annotated text snippets based on anonymized doctors\u2019 notes from urology was prepared. Different prompting strategies were used to investigate the effect of the number of examples in few-shot prompting and to explore the capabilities of the LLMs in general.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>The models Llama 3.1 8B, Mistral 7B, and Mistral NeMo 12 B performed comparably well in the tasks. Models with less extensive training data or having fewer than 7 billion parameters showed notably lower performance, while larger models did not display performance gains. Examples from a different medical domain than urology could also improve the outcome in few-shot prompting, which demonstrates the ability of LLMs to handle tasks needed for tumor documentation.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Open source LLMs show a strong potential for automating tumor documentation. Models from 7\u201312 billion parameters could offer an optimal balance between performance and resource efficiency. With tailored fine-tuning and well-designed prompting, these models might become important tools for clinical documentation in the future. The code for the evaluation is available from <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/stefan-m-lenz\/UroLlmEval\" ext-link-type=\"uri\">https:\/\/github.com\/stefan-m-lenz\/UroLlmEval<\/jats:ext-link>. We also release the data set under <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/huggingface.co\/datasets\/stefan-m-lenz\/UroLlmEvalSet\" ext-link-type=\"uri\">https:\/\/huggingface.co\/datasets\/stefan-m-lenz\/UroLlmEvalSet<\/jats:ext-link> providing a valuable resource that addresses the shortage of authentic and easily accessible benchmarks in German-language medical NLP.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/s13040-025-00463-8","type":"journal-article","created":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T13:58:58Z","timestamp":1753365538000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Can open source large language models be used for tumor documentation in Germany?\u2014An evaluation on urological doctors\u2019 notes"],"prefix":"10.1186","volume":"18","author":[{"given":"Stefan","family":"Lenz","sequence":"first","affiliation":[]},{"given":"Arsenij","family":"Ustjanzew","sequence":"additional","affiliation":[]},{"given":"Marco","family":"Jeray","sequence":"additional","affiliation":[]},{"given":"Meike","family":"Ressing","sequence":"additional","affiliation":[]},{"given":"Torsten","family":"Panholzer","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,24]]},"reference":[{"key":"463_CR1","unstructured":"Holland J. Bekanntmachung - Aktualisierter einheitlicher onkologischer Basisdatensatz der Arbeitsgemeinschaft Deutscher Tumorzentren e. V. (ADT) und der Gesellschaft der epidemiologischen Krebsregister in Deutschland e. V. (GEKID). Bundesanzeiger. 2021;BAnz AT 12.07.2021 B4."},{"key":"463_CR2","doi-asserted-by":"publisher","unstructured":"Klinkhammer-Schalke M, Kleihues van Tol K, Jurkschat R, Meyer M, Katalinic A, Holleczek B, et al. Der einheitliche onkologische Basisdatensatz (oBDS). Forum. 2024 [cited 2025 May 5];39:191\u20135. Available from: https:\/\/doi.org\/10.1007\/s12312-024-01320-1.","DOI":"10.1007\/s12312-024-01320-1"},{"key":"463_CR3","unstructured":"Bundesinstitut f\u00fcr Arzneimittel und Medizinprodukte. ICD-10-GM Version 2024, Systematisches Verzeichnis, Internationale statistische Klassifikation der Krankheiten und verwandter Gesundheitsprobleme, 10. Revision. 2024 [cited 2024 Jun 12]. Available from: https:\/\/www.bfarm.de\/DE\/Kodiersysteme\/Klassifikationen\/ICD\/ICD-10-GM\/_node.html."},{"key":"463_CR4","unstructured":"OpenAI, Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, et al. GPT-4 Technical Report. arXiv; 2024 [cited 2024 Oct 24]. Available from: http:\/\/arxiv.org\/abs\/2303.08774."},{"key":"463_CR5","doi-asserted-by":"crossref","unstructured":"Xu FF, Alon U, Neubig G, Hellendoorn VJ. A systematic evaluation of large language models of code. Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. New York, NY, USA: Association for Computing Machinery; 2022 [cited 2025 Jan 9]. p. 1\u201310. Available from: https:\/\/dl.acm.org\/doi\/10.1145\/3520312.3534862.","DOI":"10.1145\/3520312.3534862"},{"key":"463_CR6","doi-asserted-by":"crossref","unstructured":"Zhong L, Wang Z, Shang J. Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step by Step. In: Ku L-W, Martins A, Srikumar V, editors. Findings of the Association for Computational Linguistics ACL 2024. Bangkok, Thailand and virtual meeting: Association for Computational Linguistics; 2024 [cited 2024 Oct 24]. p. 851\u201370. Available from: https:\/\/aclanthology.org\/2024.findings-acl.49.","DOI":"10.18653\/v1\/2024.findings-acl.49"},{"key":"463_CR7","doi-asserted-by":"crossref","unstructured":"Laukamp KR, Terzis RA, Werner J-M, Galldiks N, Lennartz S, Maintz D, et al. Monitoring Patients with Glioblastoma by Using a Large Language Model: Accurate Summarization of Radiology Reports with GPT-4. Radiology. 2024 [cited 2024 Oct 24];312:e232640. Available from: https:\/\/pubs.rsna.org\/doi\/10.1148\/radiol.232640\n.","DOI":"10.1148\/radiol.232640"},{"key":"463_CR8","doi-asserted-by":"crossref","unstructured":"Adams LC, Truhn D, Busch F, Kader A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for Post Hoc Transformation of Free-Text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study. Radiology. 2023 [cited 2023 Apr 12];230725. Available from: https:\/\/pubs.rsna.org\/doi\/10.1148\/radiol.230725\n.","DOI":"10.1148\/radiol.230725"},{"key":"463_CR9","doi-asserted-by":"publisher","unstructured":"Lenz S, Ustjanzew A, Jeray M, Panholzer T. Few-Shot-Prompting von Large Language Models zur Extraktion von Daten zu Tumordiagnosen aus urologischen Arztbriefen \u2013 eine Evaluation. GMDS Kooperationstagung \u201cGesundheit - gemeinsam\u201d 2024. German Medical Science GMS Publishing House; 2024. p. DocAbstr. 650. Available from: https:\/\/doi.org\/10.3205\/24gmds179.","DOI":"10.3205\/24gmds179"},{"key":"463_CR10","doi-asserted-by":"crossref","unstructured":"Wiest IC, Ferber D, Zhu J, van Treeck M, Meyer SK, Juglan R, et al. Privacy-preserving large language models for structured medical information retrieval. npj Digit Med. 2024 [cited 2024 Sep 26];7:1\u20139. Available from: https:\/\/www.nature.com\/articles\/s41746-024-01233-2.","DOI":"10.1038\/s41746-024-01233-2"},{"key":"463_CR11","unstructured":"Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, von Arx S, et al. On the Opportunities and Risks of Foundation Models. arXiv:210807258 [cs]. 2021 [cited 2021 Sep 21]; Available from: https:\/\/crfm.stanford.edu\/assets\/report.pdf."},{"key":"463_CR12","doi-asserted-by":"crossref","unstructured":"Weicken E, Mittermaier M, Hoeren T, Kliesch J, Wiegand T, Witzenrath M, et al. Schwerpunkt k\u00fcnstliche Intelligenz in der Medizin \u2013\u00a0rechtliche Aspekte bei der Nutzung gro\u00dfer Sprachmodelle im klinischen Alltag. Inn Med (Heidelb). 2025 [cited 2025 Apr 7];66:436\u201341. Available from: https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC11965224\/.","DOI":"10.1007\/s00108-025-01861-0"},{"key":"463_CR13","unstructured":"Federal Republic of Germany. Gesetz zur Zusammenf\u00fchrung von Krebsregisterdaten [Law on the Consolidation of Cancer Registry Data]. Bundesgesetzblatt Jahrgang 2021 Teil I Nr. 59. Aug 18, 2021. Available from: https:\/\/dip.bundestag.de\/vorgang\/gesetz-zur-zusammenf%C3%BChrung-von-krebsregisterdaten\/273932."},{"key":"463_CR14","unstructured":"Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, Chen S, Cacciamani G, et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat Med. 2025 [cited 2025 Jan 9];1\u201310. Available from: https:\/\/www.nature.com\/articles\/s41591-024-03425-5."},{"key":"463_CR15","doi-asserted-by":"publisher","unstructured":"Frank J, Merseburger AS, Landmesser J, Brozat-Essen S, Schramm P, Freimann L, et al. [Large Language Models for Rapid Simplification of Quality Assurance Data Input: Field Trial with Real Data in the Context of Tumour Documentation in Urology]. Aktuelle Urol. 2024 [cited 2024 Oct 29];55:415\u201323. Available from: https:\/\/doi.org\/10.1055\/a-2281-8015.","DOI":"10.1055\/a-2281-8015"},{"key":"463_CR16","doi-asserted-by":"crossref","unstructured":"Liu NF, Lin K, Hewitt J, Paranjape A, Bevilacqua M, Petroni F, et al. Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics. 2024 [cited 2025 Apr 17];12:157\u201373. Available from: https:\/\/aclanthology.org\/2024.tacl-1.9\/.","DOI":"10.1162\/tacl_a_00638"},{"key":"463_CR17","doi-asserted-by":"crossref","unstructured":"Seabold S, Perktold J. statsmodels: Econometric and statistical modeling with python. 9th Python in Science Conference. 2010.","DOI":"10.25080\/Majora-92bf1922-011"},{"key":"463_CR18","unstructured":"Castro S. Fast Krippendorff: Fast computation of Krippendorff\u2019s alpha agreement measure. GitHub repository. GitHub; 2017. Available from: https:\/\/github.com\/pln-fing-udelar\/fast-krippendorff."},{"key":"463_CR19","doi-asserted-by":"crossref","unstructured":"Landis JR, Koch GG. The Measurement of Observer Agreement for Categorical Data. Biometrics. 1977 [cited 2025 May 2];33:159\u201374. Available from: https:\/\/www.jstor.org\/stable\/2529310.","DOI":"10.2307\/2529310"},{"key":"463_CR20","unstructured":"Artifex Software Inc. PyMuPDF documentation. 2025 [cited 2025 Apr 23]. Available from: https:\/\/pymupdf.readthedocs.io."},{"key":"463_CR21","unstructured":"Cohere For AI. Model card of Command R+ on HuggingFace. 2024 [cited 2024 Apr 8]. Available from: https:\/\/huggingface.co\/CohereForAI\/c4ai-command-r-plus."},{"key":"463_CR22","unstructured":"Dubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, Letman A, et al. The Llama 3 Herd of Models. arXiv; 2024 [cited 2024 Aug 1]. Available from: http:\/\/arxiv.org\/abs\/2407.21783."},{"key":"463_CR23","unstructured":"Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, Casas D de las, et al. Mistral 7B. arXiv; 2023 [cited 2024 Jan 3]. Available from: http:\/\/arxiv.org\/abs\/2310.06825."},{"key":"463_CR24","doi-asserted-by":"crossref","unstructured":"Labrak Y, Bazoge A, Morin E, Gourraud P-A, Rouvier M, Dufour R. BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains. arXiv; 2024 [cited 2024 Sep 24]. Available from: http:\/\/arxiv.org\/abs\/2402.10373.","DOI":"10.18653\/v1\/2024.findings-acl.348"},{"key":"463_CR25","unstructured":"Pl\u00fcster B, Schuhmann C. Model card of LeoLM 7B on HuggingFace. 2023 [cited 2024 Sep 25]. Available from: https:\/\/huggingface.co\/LeoLM\/leo-hessianai-7b."},{"key":"463_CR26","unstructured":"Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv; 2023 [cited 2024 Sep 26]. Available from: http:\/\/arxiv.org\/abs\/2307.09288."},{"key":"463_CR27","unstructured":"Golchinfar D, Vaziri D, Hennekeuser D, Fernandes Neto F, Atkins L, Marquardt A. VAGOsolutions - organization card on HuggingFace. 2024 [cited 2024 Sep 4]. Available from: https:\/\/huggingface.co\/VAGOsolutions."},{"key":"463_CR28","unstructured":"Jiang A, Sablayrolles A, Tacnet A, Kothari A, Roux A, Mensch A, et al. Model card of Mistral Nemo on HuggingFace. 2024 [cited 2024 Sep 25]. Available from: https:\/\/huggingface.co\/mistralai\/Mistral-Nemo-Instruct-2407."},{"key":"463_CR29","unstructured":"Jiang AQ, Sablayrolles A, Roux A, Mensch A, Savary B, Bamford C, et al. Mixtral of Experts. arXiv; 2024 [cited 2024 Jan 9]. Available from: http:\/\/arxiv.org\/abs\/2401.04088."},{"key":"463_CR30","unstructured":"Martins PH, Fernandes P, Alves J, Guerreiro NM, Rei R, Alves DM, et al. EuroLLM: Multilingual Language Models for Europe. arXiv; 2024 [cited 2024 Sep 26]. Available from: http:\/\/arxiv.org\/abs\/2409.16235."},{"key":"463_CR31","unstructured":"Meta, Inc. Model card of Llama 3.2 1B on Hugging Face. 2024 [cited 2024 Nov 21]. Available from: https:\/\/huggingface.co\/meta-llama\/Llama-3.2-1B-Instruct."},{"key":"463_CR32","unstructured":"Meta, Inc. Model card of Llama 3.2 3B on Hugging Face. 2024 [cited 2024 Nov 21]. Available from: https:\/\/huggingface.co\/meta-llama\/Llama-3.2-3B-Instruct."},{"key":"463_CR33","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv; 2019 [cited 2024 Oct 17]. Available from: http:\/\/arxiv.org\/abs\/1810.04805."},{"key":"463_CR34","unstructured":"Sanh V, Webson A, Raffel C, Bach S, Sutawika L, Alyafeai Z, et al. Multitask Prompted Training Enables Zero-Shot Task Generalization. International Conference on Learning Representations. 2022. Available from: https:\/\/openreview.net\/forum?id=9Vrb9D0WI4."},{"key":"463_CR35","unstructured":"Martins PH, Fernandes P, Alves J, Guerreiro NM, Rei R, Alves DM, et al. Model card of EuroLLM 1.7B on Hugging Face. 2024 [cited 2025 Jan 15]. Available from: https:\/\/huggingface.co\/utter-project\/EuroLLM-1.7B-Instruct."},{"key":"463_CR36","unstructured":"Labrak Y, Bazoge A, Morin E, Gourraud P-A, Rouvier M, Dufour R. Model card of BioMistral on Hugging Face. 2024 [cited 2025 Jan 15]. Available from: https:\/\/huggingface.co\/BioMistral\/BioMistral-7B."},{"key":"463_CR37","unstructured":"Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, Casas D de las, et al. Model card of Mistral 7B on Hugging Face. 2025 [cited 2025 Jan 15]. Available from: https:\/\/huggingface.co\/mistralai\/Mistral-7B-Instruct-v0.3."},{"key":"463_CR38","unstructured":"Meta, Inc. Model card of Llama 3.1 8B on Hugging Face. 2024 [cited 2025 Jan 15]. Available from: https:\/\/huggingface.co\/meta-llama\/Llama-3.1-8B-Instruct."},{"key":"463_CR39","unstructured":"Golchinfar D, Vaziri D, Hennekeuser D, Fernandes Neto F, Atkins L, Marquardt A. Model card of Llama 3.1 8B SauerkrautLM on HuggingFace. 2024 [cited 2025 Jan 15]. Available from: https:\/\/huggingface.co\/VAGOsolutions\/Llama-3.1-SauerkrautLM-8b-Instruct."},{"key":"463_CR40","unstructured":"Jiang AQ, Sablayrolles A, Mensch A, Bamford C, Chaplot DS, Casas D de las, et al. Model card of Mixtral 8x7B on Hugging Face. 2025 [cited 2025 Jan 15]. Available from: https:\/\/huggingface.co\/mistralai\/Mixtral-8x7B-Instruct-v0.1."},{"key":"463_CR41","unstructured":"Meta, Inc. Model card of Llama 3.1 70B on Hugging Face. 2024 [cited 2025 Jan 15]. Available from: https:\/\/huggingface.co\/meta-llama\/Llama-3.1-70B-Instruct."},{"key":"463_CR42","unstructured":"Golchinfar D, Vaziri D, Hennekeuser D, Fernandes Neto F, Atkins L, Marquardt A. Model card of Llama 3 8B SauerkrautLM on HuggingFace. 2024 [cited 2025 Jan 15]. Available from: https:\/\/huggingface.co\/VAGOsolutions\/Llama-3-SauerkrautLM-8b-Instruct."},{"key":"463_CR43","unstructured":"Zhao Z, Wallace E, Feng S, Klein D, Singh S. Calibrate Before Use: Improving Few-shot Performance of Language Models. In: Meila M, Zhang T, editors. Proceedings of the 38th International Conference on Machine Learning. PMLR; 2021. p. 12697\u2013706. Available from: https:\/\/proceedings.mlr.press\/v139\/zhao21c.html."},{"key":"463_CR44","doi-asserted-by":"crossref","unstructured":"Fei Y, Hou Y, Chen Z, Bosselut A. Mitigating Label Biases for In-context Learning. In: Rogers A, Boyd-Graber J, Okazaki N, editors. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto, Canada: Association for Computational Linguistics; 2023 [cited 2025 May 6]. p. 14014\u201331. Available from: https:\/\/aclanthology.org\/2023.acl-long.783\/.","DOI":"10.18653\/v1\/2023.acl-long.783"},{"key":"463_CR45","doi-asserted-by":"crossref","unstructured":"Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Transformers: State-of-the-Art Natural Language Processing. In: Liu Q, Schlangen D, editors. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics; 2020 [cited 2024 Sep 5]. p. 38\u201345. Available from: https:\/\/aclanthology.org\/2020.emnlp-demos.6.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"463_CR46","doi-asserted-by":"crossref","unstructured":"McKinney W. Data Structures for Statistical Computing in Python. In: Walt S van der, Millman J, editors. Proceedings of the 9th Python in Science Conference. 2010. p. 56\u201361.","DOI":"10.25080\/Majora-92bf1922-00a"},{"key":"463_CR47","unstructured":"Dettmers T, Lewis M, Belkada Y, Zettlemoyer L. LLM.int8(): 8-bit matrix multiplication for transformers at scale. Proceedings of the 36th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc.; 2024 [cited 2024 Sep 5]. p. 30318\u201332. Available from: https:\/\/dl.acm.org\/doi\/10.5555\/3600270.3602468\n."},{"key":"463_CR48","unstructured":"Pl\u00fcster B. LeoLM: Igniting German-Language LLM Research | LAION Blog Post. 2023 [cited 2024 Sep 26]. Available from: https:\/\/laion.ai\/blog\/leo-lm."},{"key":"463_CR49","unstructured":"Meta AI. Introducing Meta Llama 3: The most capable openly available LLM to date. [cited 2024 Apr 25]. Available from: https:\/\/ai.meta.com\/blog\/meta-llama-3\/."},{"key":"463_CR50","unstructured":"Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R, et al. Scaling Laws for Neural Language Models. arXiv; 2020 [cited 2024 Oct 16]. Available from: http:\/\/arxiv.org\/abs\/2001.08361."},{"key":"463_CR51","doi-asserted-by":"crossref","unstructured":"He Q, Zeng J, Huang W, Chen L, Xiao J, He Q, et al. Can Large Language Models Understand Real-World Complex Instructions? Proceedings of the AAAI Conference on Artificial Intelligence. 2024 [cited 2024 Oct 14];38:18188\u201396. Available from: https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/29777.","DOI":"10.1609\/aaai.v38i16.29777"},{"key":"463_CR52","doi-asserted-by":"crossref","unstructured":"Ding N, Qin Y, Yang G, Wei F, Yang Z, Su Y, et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat Mach Intell. 2023 [cited 2024 Sep 26];5:220\u201335. Available from: https:\/\/www.nature.com\/articles\/s42256-023-00626-4.","DOI":"10.1038\/s42256-023-00626-4"},{"key":"463_CR53","doi-asserted-by":"publisher","unstructured":"Melnik S, Brix T, Storck M, Riepenhausen S, Varghese J, Rudack C. Overview of German Clinical Text Corpora for Large Language Models \u2013 Scoping Review. GMDS Kooperationstagung \u201cGesundheit - gemeinsam\u201d 2024. German Medical Science GMS Publishing House; 2024. p. DocAbstr. 743. Available from: https:\/\/doi.org\/10.3205\/24gmds079.","DOI":"10.3205\/24gmds079"},{"key":"463_CR54","first-page":"835","volume":"302","author":"F Meineke","year":"2023","unstructured":"Meineke F, Modersohn L, Loeffler M, Boeker M. Announcement of the German Medical Text Corpus Project (GeMTeX). Stud Health Technol Inform. 2023;302:835\u20136.","journal-title":"Stud Health Technol Inform"},{"key":"463_CR55","doi-asserted-by":"crossref","unstructured":"Richter-Pechanski P, Wiesenbach P, Schwab DM, Kiriakou C, He M, Allers MM, et al. A distributable German clinical corpus containing cardiovascular clinical routine doctor\u2019s letters. Sci Data. 2023 [cited 2024 Jan 31];10:207. Available from: https:\/\/www.nature.com\/articles\/s41597-023-02128-9.","DOI":"10.1038\/s41597-023-02128-9"},{"key":"463_CR56","doi-asserted-by":"crossref","unstructured":"Kittner M, Lamping M, Rieke DT, G\u00f6tze J, Bajwa B, Jelas I, et al. Annotation and initial evaluation of a large annotated German oncological corpus. JAMIA Open. 2021 [cited 2024 Feb 14];4:ooab025. Available from: https:\/\/academic.oup.com\/jamiaopen\/article\/doi\/10.1093\/jamiaopen\/ooab025\/6236337\n.","DOI":"10.1093\/jamiaopen\/ooab025"},{"key":"463_CR57","doi-asserted-by":"crossref","unstructured":"Frei J, Kramer F. GERNERMED: An open German medical NER model. Software Impacts. 2022 [cited 2024 Oct 16];11:100212. Available from: https:\/\/www.sciencedirect.com\/science\/article\/pii\/S2665963821000944.","DOI":"10.1016\/j.simpa.2021.100212"},{"key":"463_CR58","doi-asserted-by":"publisher","unstructured":"Modersohn L, Schulz S, Lohr C, Hahn U. GRASCCO - The First Publicly Shareable, Multiply-Alienated German Clinical Text Corpus. Stud Health Technol Inform. 2022;296:66\u201372. Available from: https:\/\/doi.org\/10.3233\/SHTI220805.","DOI":"10.3233\/SHTI220805"},{"issue":"4","key":"463_CR59","doi-asserted-by":"publisher","first-page":"ooac087","DOI":"10.1093\/jamiaopen\/ooac087","volume":"5","author":"M Lentzen","year":"2022","unstructured":"Lentzen M, Madan S, Lage-Rupprecht V, K\u00fchnel L, Fluck J, Jacobs M, et al. Critical assessment of transformer-based AI models for German clinical notes. JAMIA Open. 2022;5(4): ooac087. https:\/\/doi.org\/10.1093\/jamiaopen\/ooac087.","journal-title":"JAMIA Open"},{"key":"463_CR60","doi-asserted-by":"crossref","unstructured":"Bressem KK, Papaioannou J-M, Grundmann P, Borchert F, Adams LC, Liu L, et al. medBERT.de: A comprehensive German BERT model for the medical domain. Expert Systems with Applications. 2024 [cited 2024 Oct 17];237:121598. Available from: https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0957417423021000.","DOI":"10.1016\/j.eswa.2023.121598"},{"key":"463_CR61","unstructured":"Chung HW, Hou L, Longpre S, Zoph B, Tay Y, Fedus W, et al. Scaling instruction-finetuned language models. Journal of Machine Learning Research. 2024 [cited 2024 Sep 25];25:1\u201353. Available from: https:\/\/www.jmlr.org\/papers\/v25\/23-0870.html."},{"key":"463_CR62","doi-asserted-by":"crossref","unstructured":"Chan B, Schweter S, M\u00f6ller T. German\u2019s Next Language Model. In: Scott D, Bel N, Zong C, editors. Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain (Online): International Committee on Computational Linguistics; 2020 [cited 2024 Oct 17]. p. 6788\u201396. Available from: https:\/\/aclanthology.org\/2020.coling-main.598.","DOI":"10.18653\/v1\/2020.coling-main.598"},{"key":"463_CR63","doi-asserted-by":"publisher","first-page":"793","DOI":"10.1007\/s00117-024-01349-2","volume":"64","author":"A Mittermeier","year":"2024","unstructured":"Mittermeier A, A\u00dfenmacher M, Schachtner B, Grosu S, Dakovic V, Kandratovich V, et al. Automatic ICD-10 coding\u202f: Natural language processing for German MRI reports. Radiologie (Heidelb). 2024;64:793\u2013800. https:\/\/doi.org\/10.1007\/s00117-024-01349-2.","journal-title":"Radiologie (Heidelb)"},{"key":"463_CR64","doi-asserted-by":"crossref","unstructured":"Soroush A, Glicksberg BS, Zimlichman E, Barash Y, Freeman R, Charney AW, et al. Large Language Models Are Poor Medical Coders \u2014 Benchmarking of Medical Code Querying. NEJM AI. 2024 [cited 2024 Jun 5];1:AIdbp2300040. Available from: https:\/\/ai.nejm.org\/doi\/full\/10.1056\/AIdbp2300040\n.","DOI":"10.1056\/AIdbp2300040"},{"key":"463_CR65","doi-asserted-by":"crossref","unstructured":"Dirschedl P, Reichle M, R\u00f6ther M. Modellprojekt Kodierqualit\u00e4t. Gesundheitswesen. 2003 [cited 2024 Sep 26];65:1\u20137. Available from: http:\/\/www.thieme-connect.de\/DOI\/10.1055\/s-2003-36914\n.","DOI":"10.1055\/s-2003-36914"},{"key":"463_CR66","doi-asserted-by":"crossref","unstructured":"Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Medical Education. 2023 [cited 2024 Oct 18];9:e45312. Available from: https:\/\/pmc.ncbi.nlm.nih.gov\/articles\/PMC9947764\/.","DOI":"10.2196\/45312"},{"key":"463_CR67","doi-asserted-by":"crossref","unstructured":"Brin D, Sorin V, Vaid A, Soroush A, Glicksberg BS, Charney AW, et al. Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Sci Rep. 2023 [cited 2024 Oct 18];13:16492. Available from: https:\/\/www.nature.com\/articles\/s41598-023-43436-9.","DOI":"10.1038\/s41598-023-43436-9"},{"key":"463_CR68","unstructured":"Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc.; 2020 [cited 2025 Jan 9]. p. 9459\u201374. Available from: https:\/\/dl.acm.org\/doi\/abs\/10.5555\/3495724.3496517\n."},{"key":"463_CR69","doi-asserted-by":"publisher","first-page":"btae353","DOI":"10.1093\/bioinformatics\/btae353","volume":"40","author":"N Matsumoto","year":"2024","unstructured":"Matsumoto N, Moran J, Choi H, Hernandez ME, Venkatesan M, Wang P, et al. Kragen: a knowledge graph-enhanced RAG framework for biomedical problem solving using large language models. Bioinformatics. 2024;40:btae353. https:\/\/doi.org\/10.1093\/bioinformatics\/btae353.","journal-title":"Bioinformatics"},{"key":"463_CR70","doi-asserted-by":"crossref","unstructured":"Unlu O, Shin J, Mailly CJ, Oates MF, Tucci MR, Varugheese M, et al. Retrieval-Augmented Generation\u2013Enabled GPT-4 for Clinical Trial Screening. NEJM AI. 2024 [cited 2024 Jun 26];0:AIoa2400181. Available from: https:\/\/ai.nejm.org\/doi\/full\/10.1056\/AIoa2400181\n.","DOI":"10.1056\/AIoa2400181"},{"key":"463_CR71","doi-asserted-by":"crossref","unstructured":"Klang E, Tessler I, Apakama DU, Abbott E, Glicksberg BS, Arnold M, et al. Assessing Retrieval-Augmented Large Language Model Performance in Emergency Department ICD-10-CM Coding Compared to Human Coders. medRxiv; 2024 [cited 2024 Oct 22]. p. 2024.10.15.24315526. Available from: https:\/\/www.medrxiv.org\/content\/10.1101\/2024.10.15.24315526v1\n.","DOI":"10.1101\/2024.10.15.24315526"},{"key":"463_CR72","doi-asserted-by":"crossref","unstructured":"Budler LC, Chen H, Chen A, Topaz M, Tam W, Bian J, et al. A Brief Review on Benchmarking for Large Language Models Evaluation in Healthcare. WIREs Data Mining and Knowledge Discovery. 2025 [cited 2025 May 5];15:e70010. Available from: https:\/\/onlinelibrary.wiley.com\/doi\/abs\/10.1002\/widm.70010\n.","DOI":"10.1002\/widm.70010"},{"key":"463_CR73","doi-asserted-by":"crossref","unstructured":"Chang Y, Wang X, Wang J, Wu Y, Yang L, Zhu K, et al. A Survey on Evaluation of Large Language Models. ACM Trans Intell Syst Technol. 2024 [cited 2025 May 5];15:39:1\u201339:45. Available from: https:\/\/dl.acm.org\/doi\/10.1145\/3641289.","DOI":"10.1145\/3641289"},{"key":"463_CR74","doi-asserted-by":"crossref","unstructured":"Liu L, Lian L, Hao Y, Pace A, Kim E, Homsi N, et al. Human-level information extraction from clinical reports with fine-tuned language models. Health Informatics; 2024 [cited 2025 May 5]. Available from: http:\/\/medrxiv.org\/lookup\/doi\/10.1101\/2024.11.18.24317466\n.","DOI":"10.1101\/2024.11.18.24317466"}],"container-title":["BioData Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-025-00463-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13040-025-00463-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-025-00463-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T13:59:08Z","timestamp":1753365548000},"score":1,"resource":{"primary":{"URL":"https:\/\/biodatamining.biomedcentral.com\/articles\/10.1186\/s13040-025-00463-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,24]]},"references-count":74,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["463"],"URL":"https:\/\/doi.org\/10.1186\/s13040-025-00463-8","relation":{},"ISSN":["1756-0381"],"issn-type":[{"type":"electronic","value":"1756-0381"}],"subject":[],"published":{"date-parts":[[2025,7,24]]},"assertion":[{"value":"21 January 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 June 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This study is based on a dataset of anonymized doctors\u2019 letters of patients that had passed away more than 10\u00a0years before the anonymization of the letters. The letters were collected and anonymized for a previous study. As there was no additional data acquisition or involvement of patients, an ethics approval was waived.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable. The manuscript does not contain data from any individual person.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"48"}}