{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T06:24:28Z","timestamp":1763101468767,"version":"3.45.0"},"reference-count":57,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T00:00:00Z","timestamp":1763078400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:sec>\n                    <jats:title>Introduction<\/jats:title>\n                    <jats:p>\n                      We developed an open, large language model (LLM)-based pipeline to extract actionable incidental findings (AIFs) from [\n                      <jats:sup>18<\/jats:sup>\n                      F]fluorodeoxyglucose positron emission tomography-computed tomography ([\n                      <jats:sup>18<\/jats:sup>\n                      F]FDG PET-CT) reports. This imaging modality often uncovers AIFs, which can affect a patient's treatment. The pipeline classifies reports for the presence of AIFs, extracts the relevant sentences, and stores the results in structured JavaScript Object Notation format, enabling use in both short- and long-term applications.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>\n                      Training, validation, and test datasets of 1,999, 248, and 250 lung cancer [\n                      <jats:sup>18<\/jats:sup>\n                      F]FDG PET-CT reports, respectively, were annotated by a nuclear medicine physician. An external test dataset of 460 reports was annotated by two nuclear medicine physicians. The training dataset was used to fine-tune an LLM using QLoRA and chain-of-thought (CoT) prompting. This was evaluated quantitatively and qualitatively on both test datasets.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The pipeline achieved document-level F1 scores of 0.917\u2009\u00b1\u20090.016 and 0.79\u2009\u00b1\u20090.025 on the internal and external test datasets. At the sentence-level, F1 scores of 0.754\u2009\u00b1\u20090.011 and 0.522\u2009\u00b1\u20090.012 were recorded, and qualitative analysis demonstrated even higher practical utility. This qualitative analysis revealed how sentence-level performance is better in practice.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion<\/jats:title>\n                    <jats:p>Llama-3.1-8B Instruct was the base LLM that provided the best combination of performance and computational efficiency. The utilisation of CoT prompting improved performance further. Radiology reporting characteristics such as length and style affect model generalisation.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>We find that a QLoRA-adapted LLM utilising CoT prompting successfully extracts AIF information at both document- and sentence-level from both internal and external PET-CT reports. We believe this model can assist with short-term clinical challenges like clinical alerts and reminders, and long-term tasks like investigating comorbidities.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/fdgth.2025.1702082","type":"journal-article","created":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T06:21:43Z","timestamp":1763101303000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Open LLM-based actionable incidental finding extraction from [18F]fluorodeoxyglucose PET-CT radiology reports"],"prefix":"10.3389","volume":"7","author":[{"given":"Stephen H.","family":"Barlow","sequence":"first","affiliation":[]},{"given":"Sugama","family":"Chicklore","sequence":"additional","affiliation":[]},{"given":"Yulan","family":"He","sequence":"additional","affiliation":[]},{"given":"Sebastien","family":"Ourselin","sequence":"additional","affiliation":[]},{"given":"Thomas","family":"Wagner","sequence":"additional","affiliation":[]},{"given":"Anna","family":"Barnes","sequence":"additional","affiliation":[]},{"given":"Gary J. R.","family":"Cook","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,11,14]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1186\/s40644-016-0091-3","article-title":"How we read oncologic FDG PET\/CT","volume":"16","author":"Hofman","year":"2016","journal-title":"Cancer Imaging"},{"key":"B2","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1016\/j.carj.2017.08.001","article-title":"Incidence and economic impact of incidental findings on 18F-FDG PET\/CT imaging","volume":"69","author":"Adams","year":"2018","journal-title":"Can Assoc Radiol J"},{"key":"B3","doi-asserted-by":"publisher","first-page":"422","DOI":"10.1016\/j.jacr.2023.01.001","article-title":"White paper: best practices in the communication and management of actionable incidental findings in emergency department imaging","volume":"20","author":"Moore","year":"2023","journal-title":"J Am Coll Radiol"},{"key":"B4","doi-asserted-by":"publisher","first-page":"566","DOI":"10.1016\/j.jacr.2020.11.006","article-title":"Management strategies to promote follow-up care for incidental findings: a scoping review","volume":"18","author":"Crable","year":"2021","journal-title":"J Am Coll Radiol"},{"key":"B5","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/s10916-023-01925-4","article-title":"Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios","volume":"47","author":"Cascella","year":"2023","journal-title":"J Med Syst"},{"key":"B6","article-title":"Lora: low-rank adaptation of large language models","author":"Hu","year":""},{"key":"B7","doi-asserted-by":"publisher","first-page":"24824","DOI":"10.48550\/arXiv.2201.11903","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei","year":"2022","journal-title":"Adv Neural Inf Process Syst"},{"key":"B8","first-page":"1","article-title":"QLoRA: efficient finetuning of quantized LLMs","volume":"36","author":"Dettmers","year":"2024","journal-title":"Adv Neural Inf Process Syst"},{"key":"B9","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1016\/j.annemergmed.2013.02.001","article-title":"Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings","volume":"62","author":"Dutta","year":"2013","journal-title":"Ann Emerg Med"},{"key":"B10","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1016\/j.annemergmed.2022.08.450","article-title":"A natural language processing and machine learning approach to identification of incidental radiology findings in trauma patients discharged from the emergency department","volume":"81","author":"Evans","year":"2023","journal-title":"Ann Emerg Med"},{"key":"B11","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/j.ijmedinf.2019.05.021","article-title":"Identifying incidental findings from radiology reports of trauma patients: an evaluation of automated feature representation methods","volume":"129","author":"Trivedi","year":"2019","journal-title":"Int J Med Inf"},{"key":"B12","doi-asserted-by":"publisher","first-page":"1983","DOI":"10.1093\/jamia\/ocae117","article-title":"Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings","volume":"31","author":"Woo","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"B13","doi-asserted-by":"crossref","DOI":"10.1101\/2022.12.02.22283043","article-title":"Identifying secondary findings in PET\/CT reports in oncological cases: a quantifying study using automated natural language processing","volume-title":"medRxiv","author":"Sekler","year":""},{"key":"B14","article-title":"Gpt-4 technical report","author":"Achiam","year":""},{"key":"B15","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1007\/s43678-023-00616-w","article-title":"Repeatability, reproducibility, and diagnostic accuracy of a commercial large language model (ChatGPT) to perform emergency department triage using the Canadian triage and acuity scale","volume":"26","author":"Franc","year":"2024","journal-title":"Can J Emerg Med"},{"key":"B16","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1186\/s12911-024-02814-7","article-title":"Uncertainty-aware automatic TNM staging classification for [18F] fluorodeoxyglucose PET-CT reports for lung cancer utilising transformer-based language models and multi-task learning","volume":"24","author":"Barlow","year":"2024","journal-title":"BMC Med Inform Decis Mak"},{"key":"B17","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1186\/s12885-020-6667-0","article-title":"Guy\u2019s cancer cohort\u2014real world evidence for cancer pathways","volume":"20","author":"Moss","year":"2020","journal-title":"BMC Cancer"},{"key":"B18","article-title":"Management of incidental findings detected during research imaging","year":""},{"key":"B19","article-title":"Recommendations on alerts and notification of imaging reports","year":""},{"key":"B20","article-title":"Incidental findings","year":""},{"key":"B21","doi-asserted-by":"publisher","first-page":"567","DOI":"10.1148\/radiol.2016152188","article-title":"\u201cChasing a ghost\u201d: factors that influence primary care physicians to follow up on incidental imaging findings","volume":"281","author":"Zafar","year":"2016","journal-title":"Radiology"},{"key":"B22","doi-asserted-by":"publisher","first-page":"2859","DOI":"10.3390\/ijms24032859","article-title":"Mechanisms contributing to the comorbidity of COPD and lung cancer","volume":"24","author":"Forder","year":"2023","journal-title":"Int J Mol Sci"},{"key":"B23","article-title":"The Llama 3 herd of models","author":"Grattafiori","year":""},{"key":"B24","article-title":"Llama: open and efficient foundation language models","author":"Touvron","year":""},{"key":"B25","article-title":"Phi-3 technical report: a highly capable language model locally on your phone","author":"Abdin","year":""},{"key":"B26","article-title":"Textbooks are all you need II: Phi-1.5 technical report","author":"Li","year":""},{"key":"B27","article-title":"Gemma: open models based on Gemini research and technology","author":"Mesnard","year":""},{"key":"B28","article-title":"Mistral 7b","author":"Jiang","year":""},{"key":"B29","article-title":"Introducing Openbiollm-Llama3-70b & 8b: Saama\u2019s AI research lab released the most openly available medical-domain Llms to date","year":""},{"key":"B30","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1038\/s41746-022-00742-2","article-title":"A large language model for electronic health records","volume":"5","author":"Yang","year":"2022","journal-title":"NPJ Digit Med"},{"key":"B31","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2022.emnlp-main.759","article-title":"Rethinking the role of demonstrations: what makes in-context learning work?","author":"Min","year":"2022"},{"key":"B32","author":"Tam","year":""},{"key":"B33","doi-asserted-by":"publisher","first-page":"1877","DOI":"10.48550\/arXiv.2005.14165","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv Neural Inf Process Syst"},{"key":"B34","article-title":"The curious case of neural text degeneration","author":"Holtzman","year":""},{"key":"B35","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1038\/s41746-018-0070-0","article-title":"Natural language generation for electronic health records","volume":"1","author":"Lee","year":"2018","journal-title":"NPJ Digit Med"},{"key":"B36","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/W17-3207","article-title":"Beam search strategies for neural machine translation","author":"Freitag","year":"2017"},{"key":"B37","article-title":"8-Bit optimizers via block-wise quantization","author":"Dettmers","year":""},{"key":"B38","first-page":"31","article-title":"Accelerating neural network training: a brief review","author":"Nokhwal","year":"2024"},{"key":"B39","doi-asserted-by":"publisher","first-page":"S4","DOI":"10.1186\/1472-6947-15-S2-S4","article-title":"Detection of sentence boundaries and abbreviations in clinical narratives","volume":"15","author":"Kreuzthaler","year":"2015","journal-title":"BMC Med Inform Decis Mak"},{"key":"B40","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","article-title":"The measurement of observer agreement for categorical data","volume":"33","author":"Landis","year":"1977","journal-title":"Biometrics"},{"key":"B41","first-page":"212","article-title":"The measurement of interrater agreement","volume-title":"Statistical Methods for Rates and Proportions","author":"Fleiss","year":"1981"},{"key":"B42","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1001\/jama.2024.21700","article-title":"Testing and evaluation of health care applications of large language models: a systematic review","volume":"333","author":"Bedi","year":"2025","journal-title":"J Am Med Assoc"},{"key":"B43","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1186\/s12911-025-02954-4","article-title":"A systematic review of large language model (LLM) evaluations in clinical medicine","volume":"25","author":"Shool","year":"2025","journal-title":"BMC Med Inform Decis Mak"},{"key":"B44","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1186\/s12911-024-02709-7","article-title":"Analyzing evaluation methods for large language models in the medical field: a scoping review","volume":"24","author":"Lee","year":"2024","journal-title":"BMC Med Inform Decis Mak"},{"key":"B45","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1186\/s12911-017-0430-8","article-title":"Effects of workload, work complexity, and repeated alerts on alert fatigue in a clinical decision support system","volume":"17","author":"Ancker","year":"2017","journal-title":"BMC Med Inform Decis Mak"},{"key":"B46","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1093\/jamia\/ocac191","article-title":"Distinct components of alert fatigue in physicians\u2019 responses to a noninterruptive clinical decision support alert","volume":"30","author":"Murad","year":"2023","journal-title":"J Am Med Inform Assoc"},{"key":"B47","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2024.acl-long.63","article-title":"ConSiDERS-the-human evaluation framework: rethinking human evaluation for generative large language models","author":"Elangovan","year":"2024"},{"key":"B48","doi-asserted-by":"crossref","DOI":"10.1145\/3654777.3676450","article-title":"Who validates the validators? Aligning LLM-assisted evaluation of LLM outputs with human preferences","author":"Shankar","year":"2024"},{"key":"B49","doi-asserted-by":"publisher","first-page":"516","DOI":"10.1053\/j.sult.2010.08.002","article-title":"Positron emission tomography-computed tomography reporting in radiation therapy planning and response assessment","volume":"31","author":"Rohren","year":"2010","journal-title":"Semin Ultrasound CT MR"},{"key":"B50","doi-asserted-by":"publisher","first-page":"1262","DOI":"10.2214\/ajr.16.17584","article-title":"Journal club: structured feedback from patients on actual radiology reports: a novel approach to improve reporting practices","volume":"208","author":"Gunn","year":"2017","journal-title":"Am J Roentgenol"},{"key":"B51","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3703155","article-title":"A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions","volume":"43","author":"Huang","year":"2025","journal-title":"ACM Trans Inf Syst"},{"key":"B52","doi-asserted-by":"publisher","first-page":"1495582","DOI":"10.3389\/fmed.2024.1495582","article-title":"Why we need to be careful with LLMs in medicine","volume":"11","author":"B\u00e9lisle-Pipon","year":"2024","journal-title":"Front Med"},{"key":"B53","article-title":"Deepseek-R1: incentivizing reasoning capability in llms via reinforcement learning","author":"Guo","year":""},{"key":"B54","doi-asserted-by":"publisher","first-page":"e50638","DOI":"10.2196\/50638","article-title":"Prompt engineering as an important emerging skill for medical professionals: tutorial","volume":"25","author":"Mesk\u00f3","year":"2023","journal-title":"J Med Internet Res"},{"key":"B55","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1016\/j.eng.2024.04.002","article-title":"Preventing the immense increase in the life-cycle energy and carbon footprints of LLM-powered intelligent chatbots","volume":"40","author":"Jiang","year":"2024","journal-title":"Engineering"},{"key":"B56","doi-asserted-by":"publisher","first-page":"20220108","DOI":"10.1259\/bjr.20220108","article-title":"Incidental findings on brain magnetic resonance imaging (MRI) in adults: a review of imaging spectrum, clinical significance, and management","volume":"96","author":"Wangaryattawanich","year":"2023","journal-title":"Br J Radiol"},{"key":"B57","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2024.acl-long.70","article-title":"Deepseekmoe: toward ultimate expert specialization in mixture-of-experts language models","author":"Dai","year":""}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1702082\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T06:21:47Z","timestamp":1763101307000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1702082\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,14]]},"references-count":57,"alternative-id":["10.3389\/fdgth.2025.1702082"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2025.1702082","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,14]]},"article-number":"1702082"}}