{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T06:36:20Z","timestamp":1778826980363,"version":"3.51.4"},"reference-count":28,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T00:00:00Z","timestamp":1764115200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Large Language Models (LLMs) have raised broad expectations for clinical use, particularly in the processing of complex medical narratives. However, in practice, more targeted Natural Language Processing (NLP) approaches may offer higher precision and feasibility for symptom extraction from real-world clinical texts. NLP provides promising tools for extracting clinical information from unstructured medical narratives. However, few studies have focused on integrating symptom information from free texts in German, particularly for complex patient groups such as emergency department (ED) patients. The ED setting presents specific challenges: high documentation pressure, heterogeneous language styles, and the need for secure, locally deployable models due to strict data protection regulations. Furthermore, German remains a low-resource language in clinical NLP.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>We implemented and compared two models for zero-shot learning\u2014GLiNER and Mistral\u2014and a fine-tuned BERT-based SCAI-BIO\/BioGottBERT model for named entity recognition (NER) of symptoms, anatomical terms, and negations in German ED anamnesis texts in an on-premises environment in a hospital. Manual annotations of 150 narratives were used for model validation. The postprocessing steps included confidence-based filtering, negation exclusion, symptom standardization, and integration with structured oncology registry data. All computations were performed on local hospital servers in an on-premises implementation to ensure full data protection compliance.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The fine-tuned SCAI-BIO\/BioGottBERT model outperformed both zero-shot approaches, achieving an F1 score of 0.84 for symptom extraction and demonstrating superior performance in negation detection. The validated pipeline enabled systematic extraction of affirmed symptoms from ED-free text, transforming them into structured data. This method allows large-scale analysis of symptom profiles across patient populations and serves as a technical foundation for symptom-based clustering and subgroup analysis.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>Our study demonstrates that modern NLP methods can reliably extract clinical symptoms from German ED free text, even under strict data protection constraints and with limited training resources. Fine-tuned models offer a precise and practical solution for integrating unstructured narratives into clinical decision-making. This work lays the methodological foundation for a new way of systematically analyzing large patient cohorts on the basis of free-text data. Beyond symptoms, this approach can be extended to extracting diagnoses, procedures, or other clinically relevant entities. Building upon this framework, we apply network-based clustering methods (in a subsequent study) to identify clinically meaningful patient subgroups and explore sex- and age-specific patterns in symptom expression.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/fdgth.2025.1623922","type":"journal-article","created":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T06:35:45Z","timestamp":1764138945000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Optimized BERT-based NLP outperforms zero-shot methods for automated symptom detection in clinical practice"],"prefix":"10.3389","volume":"7","author":[{"given":"Juan G.","family":"Diaz Ochoa","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Natalie","family":"Layer","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonas","family":"Mahr","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Faizan E","family":"Mustafa","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christian U.","family":"Menzel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martina","family":"M\u00fcller","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tobias","family":"Schilling","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gerald","family":"Illerhaus","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Markus","family":"Knott","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alexander","family":"Krohn","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2025,11,26]]},"reference":[{"key":"B1","article-title":"Epiphenomenalism","volume-title":"The Stanford Encyclopedia of Philosophy, Summer 2023","author":"Robinson","year":"2023"},{"key":"B2","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1484\/J.CNT.5.135348","article-title":"Medical anamnesis. Collecting and recollecting the past in medicine","volume":"65","author":"Tybjerg","year":"2023","journal-title":"Centaurus"},{"key":"B3","first-page":"218","article-title":"How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?","volume":"2014","author":"Raghavan","year":"2014","journal-title":"AMIA Jt Summits on Transl Sci Proc"},{"key":"B4","doi-asserted-by":"crossref","DOI":"10.1038\/s41598-025-22940-0","volume-title":"Limitations of Large Language Models in Clinical Problem-Solving Arising from Inflexible Reasoning","author":"Kim","year":"2025"},{"key":"B5","doi-asserted-by":"publisher","first-page":"e2400034","DOI":"10.1200\/CCI.24.00034","article-title":"Development and validation of a natural language processing algorithm for extracting clinical and pathological features of breast cancer from pathology reports","volume":"8","author":"Munzone","year":"2024","journal-title":"JCO Clin Cancer Inform"},{"key":"B6","doi-asserted-by":"publisher","first-page":"286","DOI":"10.4258\/hir.2023.29.4.286","article-title":"Named entity recognition in electronic health records: a methodological review","volume":"29","author":"Durango","year":"2023","journal-title":"Healthc Inform Res"},{"key":"B7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-025-02871-6","article-title":"Leveraging large language models to mimic domain expert labeling in unstructured text-based electronic healthcare records in non-English languages","volume":"25","author":"Akbasli","year":"2025","journal-title":"BMC Med Inform Decis Mak"},{"key":"B8","first-page":"1","article-title":"Can current NLI systems handle German word order? Investigating language model performance on a new German challenge set of minimal pairs","volume-title":"Proceedings of the 15th International Conference on Computational Semantics","author":"Reinig","year":"2023"},{"key":"B9","doi-asserted-by":"publisher","first-page":"1737","DOI":"10.1109\/JBHI.2021.3123192","article-title":"A deep language model for symptom extraction from clinical text and its application to extract COVID-19 symptoms from social media","volume":"26","author":"Luo","year":"2022","journal-title":"IEEE J Biomed Health Inform"},{"key":"B10","doi-asserted-by":"publisher","first-page":"713","DOI":"10.1016\/j.annemergmed.2023.07.023","article-title":"New coding guidelines reduce emergency department note bloat but more work is needed","volume":"82","author":"Marshall","year":"2023","journal-title":"Ann Emerg Med"},{"key":"B11","doi-asserted-by":"publisher","first-page":"e70278","DOI":"10.1002\/cam4.70278","article-title":"Symptom network analysis and unsupervised clustering of oncology patients identifies drivers of symptom burden and patient subgroups with distinct symptom patterns","volume":"13","author":"Bergsneider","year":"2024","journal-title":"Cancer Med"},{"key":"B12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/medicina61010133","article-title":"Navigating emergency management of cancer patients: a retrospective study on first-time, end-stage, and other established diagnoses in a high turnover emergency county hospital","volume":"61","author":"Corlade-Andrei","year":"2025","journal-title":"Medicina (B Aires)"},{"key":"B13","first-page":"6000","article-title":"Attention is all you need","author":"Vaswani","year":"2017"},{"key":"B14","volume-title":"Named Clinical Entity Recognition Benchmark","author":"Abdul","year":"2024"},{"key":"B15","first-page":"4171","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"B16","doi-asserted-by":"publisher","first-page":"409","DOI":"10.1186\/s12911-024-02825-4","article-title":"The aluminum standard: using generative artificial intelligence tools to synthesize and annotate non-structured patient data","volume":"24","author":"Diaz Ochoa","year":"2024","journal-title":"BMC Med Inform Decis Mak"},{"key":"B17","volume-title":"Mistral 7B","author":"Jiang","year":"2023"},{"key":"B18","volume-title":"GLiNER Multi-Task: Generalist Lightweight Model for Various Information Extraction Tasks","author":"Stepanov","year":"2024"},{"key":"B19","volume-title":"PromptNER: Prompting for Named Entity Recognition","author":"Ashok","year":"2023"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-024-02793-9","article-title":"Integrating structured and unstructured data for predicting emergency severity: an association and predictive study using transformer-based natural language processing models","volume":"24","author":"Zhang","year":"2024","journal-title":"BMC Med Inform Decis Mak"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-7-92","article-title":"Various criteria in the evaluation of biomedical named entity recognition","volume":"7","author":"Tsai","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"B22","doi-asserted-by":"publisher","first-page":"2353","DOI":"10.1038\/s41598-023-29323-3","article-title":"Information extraction from German radiological reports for general clinical text and language understanding","volume":"13","author":"Jantscher","year":"2023","journal-title":"Sci Rep"},{"key":"B23","doi-asserted-by":"crossref","first-page":"38","DOI":"10.18653\/v1\/2020.louhi-1.5","article-title":"GGPONC: a corpus of German medical text with rich metadata based on clinical practice guidelines","volume-title":"Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis","author":"Borchert","year":"2020"},{"key":"B24","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1016\/j.nbt.2023.08.004","article-title":"Machine translation of standardised medical terminology using natural language processing: a scoping review","volume":"77","author":"Noll","year":"2023","journal-title":"New Biotechnol"},{"key":"B25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3648471","article-title":"Utilizing BERT for information retrieval: survey, applications, resources, and challenges","volume":"56","author":"Wang","year":"2024","journal-title":"ACM Comput. Surv"},{"key":"B26","doi-asserted-by":"publisher","first-page":"103953","DOI":"10.1016\/j.artint.2023.103953","article-title":"Are the BERT family zero-shot learners? A study on their potential and limitations","volume":"322","author":"Wang","year":"2023","journal-title":"Artif Intell"},{"key":"B27","doi-asserted-by":"publisher","first-page":"1397388","DOI":"10.3389\/frai.2024.1397388","article-title":"Enhancing diagnostic accuracy in symptom-based health checkers: a comprehensive machine learning approach with clinical vignettes and benchmarking","volume":"7","author":"Aissaoui Ferhi","year":"2024","journal-title":"Front Artif Intell"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.1101\/2025.04.21.25326037","article-title":"Optimized BERT-based NLP outperforms zero-shot methods for automated symptom detection in clinical practice","author":"Diaz Ochoa","year":"2025","journal-title":"medRxiv"}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1623922\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T06:35:48Z","timestamp":1764138948000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1623922\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,26]]},"references-count":28,"alternative-id":["10.3389\/fdgth.2025.1623922"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2025.1623922","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,26]]},"article-number":"1623922"}}