{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T13:30:23Z","timestamp":1779370223508,"version":"3.53.0"},"reference-count":35,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T00:00:00Z","timestamp":1737936000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Large Language Models (LLMs) offer considerable potential to enhance various aspects of healthcare, from aiding with administrative tasks to clinical decision support. However, despite the growing use of LLMs in healthcare, a critical gap persists in clear, actionable guidelines available to healthcare organizations and providers to ensure their responsible and safe implementation. In this paper, we propose a practical step-by-step approach to bridge this gap and support healthcare organizations and providers in warranting the responsible and safe implementation of LLMs into healthcare. The recommendations in this manuscript include protecting patient privacy, adapting models to healthcare-specific needs, adjusting hyperparameters appropriately, ensuring proper medical prompt engineering, distinguishing between clinical decision support (CDS) and non-CDS applications, systematically evaluating LLM outputs using a structured approach, and implementing a solid model governance structure. We furthermore propose the ACUTE mnemonic; a structured approach for assessing LLM responses based on Accuracy, Consistency, semantically Unaltered outputs, Traceability, and Ethical considerations. Together, these recommendations aim to provide healthcare organizations and providers with a clear pathway for the responsible and safe implementation of LLMs into clinical practice.<\/jats:p>","DOI":"10.3389\/frai.2025.1504805","type":"journal-article","created":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T12:31:08Z","timestamp":1737981068000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Bridging the gap: a practical step-by-step approach to warrant safe implementation of large language models in healthcare"],"prefix":"10.3389","volume":"8","author":[{"given":"Jessica D.","family":"Workum","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Davy","family":"van de Sande","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Diederik","family":"Gommers","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michel E.","family":"van Genderen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2025,1,27]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1038\/s41746-024-01074-z","article-title":"Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI","volume":"7","author":"Abbasian","year":"2024","journal-title":"NPJ Digit Med"},{"key":"B2","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1001\/jamainternmed.2023.1838","article-title":"Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum","volume":"183","author":"Ayers","year":"2023","journal-title":"JAMA Intern. Med"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2404.01077","article-title":"Efficient prompting methods for large language models: a survey","author":"Chang","year":"2024","journal-title":"arXiv [Preprint]."},{"key":"B4","doi-asserted-by":"publisher","first-page":"e379","DOI":"10.1016\/S2589-7500(24)00060-8","article-title":"The effect of using a large language model to respond to patient messages","volume":"6","author":"Chen","year":"2024","journal-title":"Lancet Digital Health"},{"key":"B5","doi-asserted-by":"publisher","first-page":"2023","DOI":"10.1056\/AIp2300031","article-title":"Use of GPT-4 to diagnose complex clinical cases","volume":"1","author":"Eriksen","year":"2023","journal-title":"NEJM AI"},{"key":"B6","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1056\/AIcs2300235","article-title":"GPT-4 for information retrieval and comparison of medical oncology guidelines","volume":"1","author":"Ferber","year":"2024","journal-title":"NEJM AI"},{"key":"B7","doi-asserted-by":"publisher","first-page":"e243201","DOI":"10.1001\/jamanetworkopen.2024.3201","article-title":"Artificial intelligence\u2013generated draft replies to patient inbox messages","volume":"7","author":"Garcia","year":"2024","journal-title":"JAMA Netw. Open"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.4809363","article-title":"A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics","author":"He","year":"2023","journal-title":"arXiv [Preprint]"},{"key":"B9","doi-asserted-by":"publisher","first-page":"327","DOI":"10.1093\/clinchem\/hvad011","article-title":"FDA regulation of laboratory clinical decision support software: is it a medical device?","volume":"69","author":"Jackups","year":"2023","journal-title":"Clin. Chem"},{"key":"B10","doi-asserted-by":"publisher","first-page":"192","DOI":"10.1056\/AIdbp2300192","article-title":"GPT versus resident physicians \u2014 a benchmark based on official board scores","volume":"1","author":"Katz","year":"2024","journal-title":"NEJM AI"},{"key":"B11","doi-asserted-by":"publisher","first-page":"e17567","DOI":"10.2196\/17567","article-title":"Medical device apps: An introduction to regulatory affairs for developers","volume":"8","author":"Keutzer","year":"2020","journal-title":"JMIR Mhealth Uhealth"},{"key":"B12","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1016\/j.ejim.2024.09.017","article-title":"A large language model-based clinical decision support system for syncope recognition in the emergency department: a framework for clinical workflow integration","volume":"131","author":"Levra","year":"2024","journal-title":"Eur. J. Intern. Med."},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2405.16402","article-title":"Assessing empathy in large language models with real-world physician-patient interactions","author":"Luo","year":"2024","journal-title":"arXiv [Preprint]."},{"key":"B14","unstructured":"Mao\n              R.\n            \n            \n              Chen\n              G.\n            \n            \n              Zhang\n              X.\n            \n            \n              Guerin\n              F.\n            \n            \n              Cambria\n              E.\n            \n          \n          GPTEval: A Survey on Assessments of ChatGPT and GPT-4\n          \n          2023"},{"key":"B15","doi-asserted-by":"publisher","first-page":"148","DOI":"10.3390\/medicina60010148","article-title":"Chain of thought utilization in large language models and application in nephrology","volume":"60","author":"Miao","year":"2024","journal-title":"Medicina (Lithuania)"},{"key":"B16","doi-asserted-by":"publisher","first-page":"7919","DOI":"10.1007\/s12652-023-04601-0","article-title":"A model to improve user acceptance of e-services in healthcare systems based on technology acceptance model: an empirical study","volume":"14","author":"Nazari-Shirkouhi","year":"2023","journal-title":"J. Ambient Intell. Humaniz. Comput."},{"key":"B17","doi-asserted-by":"publisher","first-page":"380","DOI":"10.1056\/AIra2400380","article-title":"RAG in health care: a novel framework for improving communication and decision-making by addressing LLM limitations","volume":"2","author":"Ng","year":"2025","journal-title":"NEJM AI"},{"key":"B18","unstructured":"Nori\n              H.\n            \n            \n              King\n              N.\n            \n            \n              Mckinney\n              S. M.\n            \n            \n              Carignan\n              D.\n            \n            \n              Horvitz\n              E.\n            \n            \n              Openai\n              M.\n            \n          \n          Capabilities of GPT-4 on Medical Challenge Problems\n          \n          2023"},{"key":"B19","unstructured":"Open\n              A. I.\n            \n            \n              Achiam\n              J.\n            \n            \n              Adler\n              S.\n            \n            \n              Agarwal\n              S.\n            \n            \n              Ahmad\n              L.\n            \n            \n              Akkaya\n              I.\n            \n          \n          GPT-4 Technical Report\n          \n          2023"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.1101\/2023.03.22.23287585","article-title":"Bias amplification in intersectional subpopulations for clinical phenotyping by large language models","author":"Pal","year":"2023","journal-title":"MedRxiv [Preprint]"},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.1101\/2024.07.22.24310824","article-title":"Exploring temperature effects on large language models across various clinical tasks","author":"Patel","year":"2024","journal-title":"medRxiv [Preprint]"},{"key":"B22","doi-asserted-by":"publisher","first-page":"116119","DOI":"10.1016\/j.psychres.2024.116119","article-title":"Assessing dimensions of thought disorder with large language models: the tradeoff of accuracy and consistency","volume":"341","author":"Pugh","year":"2024","journal-title":"Psychiatry Res"},{"key":"B23","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1038\/s41746-023-00988-4","article-title":"Generative AI and large language models in health care: pathways to implementation","volume":"7","author":"Raza","year":"2024","journal-title":"NPJ Digit. Med"},{"key":"B24","doi-asserted-by":"publisher","first-page":"10809","DOI":"10.1101\/2024.07.27.24310809","article-title":"Multimodal large language model passes specialty board examination and surpasses human test-taker scores: a comparative analysis examining the stepwise impact of model prompting strategies on performance","volume":"2024","author":"Samaan","year":"2024","journal-title":"medRxiv"},{"key":"B25","doi-asserted-by":"crossref","unstructured":"Schoonbeek\n              R. C.\n            \n            \n              Workum\n              J. D.\n            \n            \n              Schuit\n              S. C. E.\n            \n            \n              Doornberg\n              J. N.\n            \n            \n              Van Der Laan\n              T. P.\n            \n            \n              Bootsma-Robroeks\n              C. M. H. H.T.\n            \n          \n          Completeness, Correctness and Conciseness of Physician-written versus Large Language Model Generated Patient Summaries Integrated in Electronic Health Records\n          \n          2024","DOI":"10.2139\/ssrn.4835935"},{"key":"B26","doi-asserted-by":"publisher","first-page":"E246565","DOI":"10.1001\/jamanetworkopen.2024.6565","article-title":"AI-generated draft replies integrated into health records and physicians' electronic communication","volume":"2024","author":"Tai-Seale","year":"2024","journal-title":"JAMA Netw. Open"},{"key":"B27","doi-asserted-by":"publisher","first-page":"1930","DOI":"10.1038\/s41591-023-02448-8","article-title":"Large language models in medicine","volume":"29","author":"Thirunavukarasu","year":"2023","journal-title":"Nat. Med."},{"key":"B28","doi-asserted-by":"publisher","first-page":"1134","DOI":"10.1038\/s41591-024-02855-5","article-title":"Adapted large language models can outperform medical experts in clinical text summarization","volume":"30","author":"van Veen","year":"2023","journal-title":"Nat. Med."},{"key":"B29","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2201.11903","article-title":"Chain-of-thought prompting elicits reasoning in large language models chain-of-thought prompting","author":"Wei","year":"2022","journal-title":"arXiv [Preprint]."},{"key":"B30","volume-title":"Guidance on Large Multi-modal Models","year":"2024"},{"key":"B31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1056\/AIdbp2300092","article-title":"Benchmarking open-source large language models, GPT-4 and Claude 2 on multiple-choice questions in nephrology","volume":"1","author":"Wu","year":"2024","journal-title":"NEJM AI"},{"key":"B32","doi-asserted-by":"publisher","first-page":"100211","DOI":"10.1016\/j.hcc.2024.100211","article-title":"A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly","volume":"2024","author":"Yao","year":"2024","journal-title":"High-Confid. Comp"},{"key":"B33","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1056\/AIoa2300068","article-title":"Almanac\u2014retrieval-augmented language models for clinical medicine","volume":"1","author":"Zakka","year":"2024","journal-title":"NEJM AI"},{"key":"B34","doi-asserted-by":"publisher","DOI":"10.1101\/2024.02.07.24302444","article-title":"Comparison of prompt engineering and fine-tuning strategies in large language models in the classification of clinical notes","author":"Zhang","year":"2024","journal-title":"medRxiv [Preprint]"},{"key":"B35","unstructured":"Zhang\n              Y.\n            \n            \n              Hou\n              S.\n            \n            \n              Derek Ma\n              M.\n            \n            \n              Wang\n              W.\n            \n            \n              Chen\n              M.\n            \n            \n              Zhao\n              J.\n            \n          \n          \n          2024"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1504805\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T12:31:13Z","timestamp":1737981073000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1504805\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,27]]},"references-count":35,"alternative-id":["10.3389\/frai.2025.1504805"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1504805","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,27]]},"article-number":"1504805"}}