{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T15:22:53Z","timestamp":1778685773689,"version":"3.51.4"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,1,24]],"date-time":"2024-01-24T00:00:00Z","timestamp":1706054400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,24]],"date-time":"2024-01-24T00:00:00Z","timestamp":1706054400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>One of the major barriers to using large language models (LLMs) in medicine is the perception they use uninterpretable methods to make clinical decisions that are inherently different from the cognitive processes of clinicians. In this manuscript we develop diagnostic reasoning prompts to study whether LLMs can imitate clinical reasoning while accurately forming a diagnosis. We find that GPT-4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy. This is significant because an LLM that can imitate clinical reasoning to provide an interpretable rationale offers physicians a means to evaluate whether an LLMs response is likely correct and can be trusted for patient care. Prompting methods that use diagnostic reasoning have the potential to mitigate the \u201cblack box\u201d limitations of LLMs, bringing them one step closer to safe and effective use in medicine.<\/jats:p>","DOI":"10.1038\/s41746-024-01010-1","type":"journal-article","created":{"date-parts":[[2024,1,24]],"date-time":"2024-01-24T10:02:22Z","timestamp":1706090542000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":214,"title":["Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine"],"prefix":"10.1038","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4828-5802","authenticated-orcid":false,"given":"Thomas","family":"Savage","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2024-3683","authenticated-orcid":false,"given":"Ashwin","family":"Nayak","sequence":"additional","affiliation":[]},{"given":"Robert","family":"Gallo","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5168-3508","authenticated-orcid":false,"given":"Ekanath","family":"Rangan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4387-8740","authenticated-orcid":false,"given":"Jonathan H.","family":"Chen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,1,24]]},"reference":[{"key":"1010_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41591-023-02448-8","volume":"29","author":"AJ Thirunavukarasu","year":"2023","unstructured":"Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1\u201311 (2023).","journal-title":"Nat. Med."},{"key":"1010_CR2","doi-asserted-by":"publisher","first-page":"2399","DOI":"10.1056\/NEJMsr2214184","volume":"388","author":"P Lee","year":"2023","unstructured":"Lee, P., Bubeck, S. & Petro, J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N. Engl. J. Med. 388, 2399\u20132400 (2023).","journal-title":"N. Engl. J. Med."},{"key":"1010_CR3","doi-asserted-by":"publisher","first-page":"e232561","DOI":"10.1001\/jamainternmed.2023.2561","volume":"183","author":"A Nayak","year":"2023","unstructured":"Nayak, A. et al. Comparison of history of present illness summaries generated by a chatbot and senior internal medicine residents. JAMA Intern. Med. 183, e232561 (2023).","journal-title":"JAMA Intern. Med."},{"key":"1010_CR4","doi-asserted-by":"publisher","first-page":"e0000198","DOI":"10.1371\/journal.pdig.0000198","volume":"2","author":"TH Kung","year":"2023","unstructured":"Kung, T. H. et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit. Health 2, e0000198 (2023).","journal-title":"PLoS Digit. Health"},{"key":"1010_CR5","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1001\/jamainternmed.2023.1838","volume":"183","author":"JW Ayers","year":"2023","unstructured":"Ayers, J. W. et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 183, 589\u2013596 (2023).","journal-title":"JAMA Intern. Med."},{"key":"1010_CR6","doi-asserted-by":"publisher","first-page":"842","DOI":"10.1001\/jama.2023.1044","volume":"329","author":"A Sarraju","year":"2023","unstructured":"Sarraju, A. et al. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 329, 842\u2013844 (2023).","journal-title":"JAMA"},{"key":"1010_CR7","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","volume":"620","author":"K Singhal","year":"2023","unstructured":"Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172\u2013180 (2023).","journal-title":"Nature"},{"key":"1010_CR8","doi-asserted-by":"publisher","unstructured":"Singhal, K. et al. Towards expert-level medical question answering with large language models. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2305.09617 (2023).","DOI":"10.48550\/arXiv.2305.09617"},{"key":"1010_CR9","doi-asserted-by":"publisher","unstructured":"Nori, H., King, N., McKinney, S. M., Carignan, D. & Horvitz, E. Capabilities of GPT-4 on medical challenge problems. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2303.13375 (2023).","DOI":"10.48550\/arXiv.2303.13375"},{"key":"1010_CR10","doi-asserted-by":"crossref","unstructured":"Ali, R. et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. Neurosurgery. 93, 1353\u20131365 (2023).","DOI":"10.1227\/neu.0000000000002632"},{"key":"1010_CR11","doi-asserted-by":"crossref","unstructured":"Ali, R. et al. Performance of ChatGPT, GPT-4, and Google Bard on a neurosurgery oral boards preparation question bank. Neurosurgery. 93, 1090\u20131098 (2023).","DOI":"10.1227\/neu.0000000000002551"},{"key":"1010_CR12","doi-asserted-by":"publisher","first-page":"1028","DOI":"10.1001\/jamainternmed.2023.2909","volume":"183","author":"E Strong","year":"2023","unstructured":"Strong, E. et al. Chatbot vs medical student performance on free-response clinical reasoning examinations. JAMA Intern. Med. 183, 1028\u20131030 (2023).","journal-title":"JAMA Intern. Med."},{"key":"1010_CR13","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1001\/jama.2023.8288","volume":"330","author":"Z Kanjee","year":"2023","unstructured":"Kanjee, Z., Crowe, B. & Rodman, A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 330, 78\u201380 (2023).","journal-title":"JAMA"},{"key":"1010_CR14","unstructured":"Brown, T. B. et al. Language models are few-shot learners. In Proc. of the 34th International Conference on Neural Information Processing Systems (NIPS'20). 159, 1877\u20131901 (Curran Associates Inc., Red Hook, NY, USA)."},{"key":"1010_CR15","unstructured":"Peng, B., Li, C., He, P., Galley, M. & Gao, J. Instruction tuning with GPT-4. Preprint at http:\/\/arxiv.org\/abs\/2304.03277 (2023)."},{"key":"1010_CR16","doi-asserted-by":"publisher","unstructured":"Wang, J. et al. Prompt engineering for healthcare: methodologies and applications. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2304.14670 (2023).","DOI":"10.48550\/arXiv.2304.14670"},{"key":"1010_CR17","unstructured":"Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Preprint at http:\/\/arxiv.org\/abs\/2201.11903 (2023)."},{"key":"1010_CR18","unstructured":"Lightman, H. et al. Let\u2019s verify step by step. Preprint at http:\/\/arxiv.org\/abs\/2305.20050 (2023)."},{"key":"1010_CR19","unstructured":"OpenAI. 2023. OpenAI GPT-3.5 API [text-davinci-003] and GPT-4 API. Available at: https:\/\/platform.openai.com."},{"key":"1010_CR20","doi-asserted-by":"publisher","unstructured":"Jin, D. et al. What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2009.13081 (2020).","DOI":"10.48550\/arXiv.2009.13081"},{"key":"1010_CR21","unstructured":"Case records of the Massachusetts General Hospital articles. 2020-2023. N. Engl. J. Med. Accessed: May 2023. https:\/\/www.nejm.org\/medical-articles\/case-records-of-the-massachusetts-general-hospital."},{"key":"1010_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-023-00751-9","volume":"6","author":"DW Joyce","year":"2023","unstructured":"Joyce, D. W., Kormilitzin, A., Smith, K. A. & Cipriani, A. Explainable artificial intelligence for mental health through transparency and interpretability for understandability. Npj Digital Med. 6, 1\u20137 (2023).","journal-title":"Npj Digital Med."},{"key":"1010_CR23","unstructured":"UpToDate: industry-leading clinical decision support. Wolters Kluwer. Accessed: June 2023. https:\/\/www.wolterskluwer.com\/en\/solutions\/uptodate."},{"key":"1010_CR24","unstructured":"MKSAP 19. ACP Online. Accessed: June 2023. https:\/\/www.acponline.org\/featured-products\/mksap-19 (2023)."},{"key":"1010_CR25","unstructured":"StatPearls. NCBI Bookshelf. Accessed: June 2023. https:\/\/www.statpearls.com\/."},{"key":"1010_CR26","unstructured":"DSP: The Demonstrate\u2013Search\u2013Predict Framework. Accessed: March 2023. GitHub - stanfordnlp\/dspy at v1. https:\/\/github.com\/stanfordnlp\/dspy\/tree\/v1."},{"key":"1010_CR27","unstructured":"Khattab, O. et al. Demonstrate-Search-Predict: composing retrieval and language models for knowledge-intensive NLP. Preprint at http:\/\/arxiv.org\/abs\/2212.14024 (2023)."},{"key":"1010_CR28","doi-asserted-by":"publisher","unstructured":"Wang, X. et al. Self-consistency improves chain of thought reasoning in language models. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2203.11171 (2023).","DOI":"10.48550\/arXiv.2203.11171"}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01010-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01010-1","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01010-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,24]],"date-time":"2024-01-24T10:04:37Z","timestamp":1706090677000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01010-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,24]]},"references-count":28,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["1010"],"URL":"https:\/\/doi.org\/10.1038\/s41746-024-01010-1","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,24]]},"assertion":[{"value":"14 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 January 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"20"}}