{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T19:37:53Z","timestamp":1781725073102,"version":"3.54.5"},"reference-count":75,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T00:00:00Z","timestamp":1776902400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T00:00:00Z","timestamp":1776902400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["186932"],"award-info":[{"award-number":["186932"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Large language models (LLMs) are increasingly used by physicians for diagnostic support. A key advantage of LLMs is the ability to generate explanations that can help physicians understand the reasoning behind a diagnosis. However, the best-suited format for LLM-generated explanations remains unclear. In this large-scale study, we examined the effect of different formats for LLM explanations on clinical decision-making. For this, we conducted a randomized experiment with radiologists reviewing patient cases with radiological images (\n                    <jats:italic>N<\/jats:italic>\n                    = 2020 assessments). Participants received either no LLM support (control group) or were supported by one of three LLM-generated explanations: (1) a\n                    <jats:italic>standard output<\/jats:italic>\n                    providing the diagnosis without explanation; (2) a\n                    <jats:italic>differential diagnosis<\/jats:italic>\n                    comparing multiple possible diagnoses; or (3) a\n                    <jats:italic>chain-of-thought<\/jats:italic>\n                    explanation offering a detailed reasoning process for the diagnosis. We find that the format of explanations significantly influences diagnostic accuracy. The chain-of-thought explanations yielded the best performance, improving the diagnostic accuracy by 12.2% compared to the control condition without LLM support (\n                    <jats:italic>P<\/jats:italic>\n                    = 0.001). The chain-of-thought explanations are also superior to the standard output without explanation ( + 7.2%;\n                    <jats:italic>P<\/jats:italic>\n                    = 0.040) and the differential diagnosis format ( + 9.7%;\n                    <jats:italic>P<\/jats:italic>\n                    = 0.004). We further assessed the robustness of these findings across case difficulty and different physician backgrounds, such as general vs. specialized radiologists. Evidently, in the controlled setting of our vignette study, explaining the reasoning for a diagnosis helps physicians to identify and correct potential errors in LLM predictions and thus improve overall decisions. Altogether, the results highlight the importance of explanations in medical LLMs to support the reasoning processes of physicians, so that medical LLMs can improve diagnostic performance and, ultimately, patient outcomes.\n                  <\/jats:p>","DOI":"10.1038\/s41746-026-02619-0","type":"journal-article","created":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T16:35:25Z","timestamp":1776962125000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["The effect of medical explanations from large language models on diagnostic accuracy in radiology"],"prefix":"10.1038","volume":"9","author":[{"given":"Philipp","family":"Spitzer","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Daniel","family":"Hendriks","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jan","family":"Rudolph","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sarah","family":"Schlaeger","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jens","family":"Ricke","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Niklas","family":"K\u00fchl","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Boj Friedrich","family":"Hoppe","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stefan","family":"Feuerriegel","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2026,4,23]]},"reference":[{"key":"2619_CR1","doi-asserted-by":"publisher","first-page":"1930","DOI":"10.1038\/s41591-023-02448-8","volume":"29","author":"AJ Thirunavukarasu","year":"2023","unstructured":"Thirunavukarasu, A. J. et al. Large language models in medicine. Nature Medicine 29, 1930\u20131940 (2023).","journal-title":"Nature Medicine"},{"key":"2619_CR2","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-024-52415-1","volume":"15","author":"CYK Williams","year":"2024","unstructured":"Williams, C. Y. K., Miao, B. Y., Kornblith, A. E. & Butte, A. J. Evaluating the use of large language models to provide clinical recommendations in the Emergency Department. Nature Communications 15, 8236 (2024).","journal-title":"Nature Communications"},{"key":"2619_CR3","doi-asserted-by":"publisher","first-page":"AIp2300031","DOI":"10.1056\/AIp2300031","volume":"1","author":"AV Eriksen","year":"2024","unstructured":"Eriksen, A. V., M\u00f6ller, S. & Ryg, J. Use of GPT-4 to diagnose complex clinical cases. NEJM AI 1, AIp2300031 (2024).","journal-title":"NEJM AI"},{"key":"2619_CR4","doi-asserted-by":"publisher","first-page":"e2440969","DOI":"10.1001\/jamanetworkopen.2024.40969","volume":"7","author":"E Goh","year":"2024","unstructured":"Goh, E. et al. Large language model influence on diagnostic reasoning: a randomized clinical trial. JAMA Network Open 7, e2440969 (2024).","journal-title":"JAMA Network Open"},{"key":"2619_CR5","doi-asserted-by":"publisher","first-page":"2613","DOI":"10.1038\/s41591-024-03097-1","volume":"30","author":"P Hager","year":"2024","unstructured":"Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nature Medicine 30, 2613\u20132622 (2024).","journal-title":"Nature Medicine"},{"key":"2619_CR6","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1001\/jama.2023.8288","volume":"330","author":"Z Kanjee","year":"2023","unstructured":"Kanjee, Z., Crowe, B. & Rodman, A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 330, 78\u201380 (2023).","journal-title":"JAMA"},{"key":"2619_CR7","doi-asserted-by":"publisher","first-page":"1233","DOI":"10.1038\/s41591-024-03456-y","volume":"31","author":"E Goh","year":"2025","unstructured":"Goh, E. et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nature Medicine 31, 1233\u20131238 (2025).","journal-title":"Nature Medicine"},{"key":"2619_CR8","doi-asserted-by":"publisher","first-page":"eadn9602","DOI":"10.1126\/science.adn9602","volume":"383","author":"EJ Topol","year":"2024","unstructured":"Topol, E. J. Toward the eradication of medical diagnostic errors. Science 383, eadn9602 (2024).","journal-title":"Science"},{"key":"2619_CR9","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1136\/bmjqs-2021-014130","volume":"33","author":"DE Newman-Toker","year":"2024","unstructured":"Newman-Toker, D. E. et al. Burden of serious harms from diagnostic error in the USA. BMJ Quality & Safety 33, 109\u2013120 (2024).","journal-title":"BMJ Quality & Safety"},{"key":"2619_CR10","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1056\/NEJMsa2206117","volume":"388","author":"DW Bates","year":"2023","unstructured":"Bates, D. W. et al. The safety of inpatient health care. New England Journal of Medicine 388, 142\u2013153 (2023).","journal-title":"New England Journal of Medicine"},{"key":"2619_CR11","unstructured":"Agency for Healthcare Research and Quality. Diagnostic errors in the emergency department: a systematic review. https:\/\/effectivehealthcare.ahrq.gov\/products\/diagnostic-errors-emergency-updated\/research."},{"key":"2619_CR12","unstructured":"Brodeur, P. G. et al. Superhuman performance of a large language model on the reasoning tasks of a physician http:\/\/arxiv.org\/abs\/2412.10849 (2024)."},{"key":"2619_CR13","doi-asserted-by":"publisher","first-page":"AIdbp2300092","DOI":"10.1056\/AIdbp2300092","volume":"1","author":"S Wu","year":"2024","unstructured":"Wu, S. et al. Benchmarking open-source large language models, GPT-4 and Claude 2 on multiple-choice questions in nephrology. NEJM AI 1, AIdbp2300092 (2024).","journal-title":"NEJM AI"},{"key":"2619_CR14","doi-asserted-by":"publisher","first-page":"e2347075","DOI":"10.1001\/jamanetworkopen.2023.47075","volume":"6","author":"A Rodman","year":"2023","unstructured":"Rodman, A., Buckley, T. A., Manrai, A. K. & Morgan, D. J. Artificial intelligence vs clinician performance in estimating probabilities of diagnoses before and after testing. JAMA Network Open 6, e2347075 (2023).","journal-title":"JAMA Network Open"},{"key":"2619_CR15","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1001\/jamaophthalmol.2023.6917","volume":"142","author":"AS Huang","year":"2024","unstructured":"Huang, A. S., Hirabayashi, K., Barna, L., Parikh, D. & Pasquale, L. R. Assessment of a large language model\u2019s responses to questions and cases about glaucoma and retina management. JAMA Ophthalmology 142, 371\u2013375 (2024).","journal-title":"JAMA Ophthalmology"},{"key":"2619_CR16","unstructured":"Sox, H. C. (ed.) Medical decision making (American College of Physicians, Philadelphia, 2007), ACP Press, 2007 edn."},{"key":"2619_CR17","doi-asserted-by":"publisher","first-page":"1214","DOI":"10.1001\/jama.281.13.1214","volume":"281","author":"WS Richardson","year":"1999","unstructured":"Richardson, W. S. et al. Users\u2019 guides to the medical literature: how to use an article about disease probability for differential diagnosis. JAMA 281, 1214\u20131219 (1999).","journal-title":"JAMA"},{"key":"2619_CR18","unstructured":"Wei, J. et al. Chain-of-Thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems https:\/\/arxiv.org\/abs\/2201.11903 (2022)."},{"key":"2619_CR19","first-page":"22199","volume":"35","author":"T Kojima","year":"2022","unstructured":"Kojima, T., Gu, S. S., Reid, M., Matsuo, Y. & Iwasawa, Y. Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems 35, 22199\u201322213 (2022).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2619_CR20","unstructured":"Wang, X. et al. Self-consistency improves chain of thought reasoning in language models. In International Conference on Learning Representations (2023)."},{"key":"2619_CR21","doi-asserted-by":"publisher","first-page":"e12","DOI":"10.1016\/S2589-7500(23)00225-X","volume":"6","author":"T Zack","year":"2024","unstructured":"Zack, T. et al. Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: A model evaluation study. The Lancet Digital Health 6, e12\u2013e22 (2024).","journal-title":"The Lancet Digital Health"},{"key":"2619_CR22","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-021-00385-9","volume":"4","author":"S Gaube","year":"2021","unstructured":"Gaube, S. et al. Do as AI say: susceptibility in deployment of clinical decision-aids. npj Digital Medicine 4, 31 (2021).","journal-title":"npj Digital Medicine"},{"key":"2619_CR23","doi-asserted-by":"publisher","first-page":"837","DOI":"10.1038\/s41591-024-02850-w","volume":"30","author":"F Yu","year":"2024","unstructured":"Yu, F. et al. Heterogeneity and predictors of the effects of AI assistance on radiologists. Nature Medicine 30, 837\u2013849 (2024).","journal-title":"Nature Medicine"},{"key":"2619_CR24","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1093\/jamia\/ocw105","volume":"24","author":"D Lyell","year":"2017","unstructured":"Lyell, D. & Coiera, E. Automation bias and verification complexity: a systematic review. Journal of the American Medical Informatics Association 24, 423\u2013431 (2017).","journal-title":"Journal of the American Medical Informatics Association"},{"key":"2619_CR25","doi-asserted-by":"publisher","first-page":"103952","DOI":"10.1016\/j.artint.2023.103952","volume":"322","author":"M Vered","year":"2023","unstructured":"Vered, M., Livni, T., Howe, P. D. L., Miller, T. & Sonenberg, L. The effects of explanations on automation bias. Artificial Intelligence 322, 103952 (2023).","journal-title":"Artificial Intelligence"},{"key":"2619_CR26","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1007\/s12599-023-00834-7","volume":"66","author":"S Feuerriegel","year":"2024","unstructured":"Feuerriegel, S., Hartmann, J., Janiesch, C. & Zschech, P. Generative AI. Business & Information Systems Engineering 66, 111\u2013126 (2024).","journal-title":"Business & Information Systems Engineering"},{"key":"2619_CR27","doi-asserted-by":"crossref","unstructured":"McKenna, N. et al. Sources of jallucination by large language models on inference tasks. In Findings of the Association for Computational Linguistics: EMNLP 2023, 2758\u20132774 https:\/\/aclanthology.org\/2023.findings-emnlp.182\/ (2023).","DOI":"10.18653\/v1\/2023.findings-emnlp.182"},{"key":"2619_CR28","doi-asserted-by":"publisher","first-page":"e0846","DOI":"10.1097\/RTI.0000000000000846","volume":"40","author":"T Cesur","year":"2025","unstructured":"Cesur, T., Gunes, Y. C., Camur, E. & Da\u011fli, M. Empowering radiologists with ChatGPT-4o: Comparative evaluation of large language models and radiologists in cardiac cases. Journal of Thoracic Imaging 40, e0846 (2025).","journal-title":"Journal of Thoracic Imaging"},{"key":"2619_CR29","doi-asserted-by":"crossref","unstructured":"Everett, S. S. et al. From tool to teammate: A randomized controlled trial of clinician-AI collaborative workflows for diagnosis. medRxiv (2025).","DOI":"10.1101\/2025.06.07.25329176"},{"key":"2619_CR30","doi-asserted-by":"crossref","unstructured":"Goh, E. et al. Influence of a large language model on diagnostic reasoning: A randomized clinical vignette study. medRxiv (2024).","DOI":"10.1101\/2024.03.12.24303785"},{"key":"2619_CR31","doi-asserted-by":"crossref","unstructured":"Qazi, I. A. et al. The impact of large language models on diagnostic reasoning among LLM-trained \u00fchysicians: A randomized clinical trial (2025).","DOI":"10.1257\/rct.15117"},{"key":"2619_CR32","doi-asserted-by":"publisher","first-page":"e63857","DOI":"10.2196\/63857","volume":"9","author":"FA Weuthen","year":"2025","unstructured":"Weuthen, F. A., Otte, N., Krabbe, H., Kraus, T. & Krabbe, J. Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial. JMIR formative research 9, e63857 (2025).","journal-title":"JMIR formative research"},{"key":"2619_CR33","unstructured":"Nori, H., King, N., McKinney, S. M., Carignan, D. & Horvitz, E. Capabilities of GPT-4 on medical challenge problems. arXiv:2303.13375 (2023)."},{"key":"2619_CR34","doi-asserted-by":"crossref","unstructured":"Sun, E. X., Shi, J. & Mandell, J. C. (eds.) Core radiology: a visual approach to diagnostic imaging (Cambridge University Press, 2021), 2 edn. https:\/\/www.cambridge.org\/core\/product\/identifier\/9781108966450\/type\/book.","DOI":"10.1017\/9781108966450"},{"key":"2619_CR35","doi-asserted-by":"publisher","first-page":"466","DOI":"10.1038\/s41586-024-07618-3","volume":"634","author":"MY Lu","year":"2024","unstructured":"Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466\u2013473 (2024).","journal-title":"Nature"},{"key":"2619_CR36","doi-asserted-by":"publisher","first-page":"e2437711","DOI":"10.1001\/jamanetworkopen.2024.37711","volume":"7","author":"D Chen","year":"2024","unstructured":"Chen, D. et al. Performance of multimodal artificial intelligence chatbots evaluated on clinical oncology cases. JAMA Network Open 7, e2437711 (2024).","journal-title":"JAMA Network Open"},{"key":"2619_CR37","doi-asserted-by":"publisher","first-page":"900","DOI":"10.1038\/s41591-020-0842-3","volume":"26","author":"Y Liu","year":"2020","unstructured":"Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nature Medicine 26, 900\u2013908 (2020).","journal-title":"Nature Medicine"},{"key":"2619_CR38","doi-asserted-by":"publisher","first-page":"929","DOI":"10.1148\/radiol.2017171684","volume":"286","author":"AB Rosenkrantz","year":"2018","unstructured":"Rosenkrantz, A. B., Wang, W., Hughes, D. R. & Duszak, R. Generalist versus subspecialist characteristics of the U.S. radiologist workforce. Radiology 286, 929\u2013937 (2018).","journal-title":"Radiology"},{"key":"2619_CR39","doi-asserted-by":"publisher","first-page":"812","DOI":"10.1016\/j.jacr.2019.11.027","volume":"17","author":"AB Rosenkrantz","year":"2020","unstructured":"Rosenkrantz, A. B., Hughes, D. R. & Duszak, R. Increasing subspecialization of the national radiologist workforce. Journal of the American College of Radiology 17, 812\u2013818 (2020).","journal-title":"Journal of the American College of Radiology"},{"key":"2619_CR40","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1186\/s13244-023-01481-y","volume":"14","author":"M Rupreht","year":"2023","unstructured":"Rupreht, M., Ricci, P., Prosch, H. & Adriaensen, M. E. A. P. M. Subspecialisation in radiology in Europe, a survey of the accreditation council of imaging. Insights into Imaging 14, 159 (2023).","journal-title":"Insights into Imaging"},{"key":"2619_CR41","doi-asserted-by":"publisher","first-page":"e101102","DOI":"10.1136\/bmjhci-2024-101102","volume":"31","author":"CR Blease","year":"2024","unstructured":"Blease, C. R., Locher, C., Gaab, J., H\u00e4gglund, M. & Mandl, K. D. Generative artificial intelligence in primary care: An online survey of UK general practitioners. BMJ Health & Care Informatics 31, e101102 (2024).","journal-title":"BMJ Health & Care Informatics"},{"key":"2619_CR42","unstructured":"McDuff, D. et al. Towards accurate differential diagnosis with large language models. Naturehttps:\/\/www.nature.com\/articles\/s41586-025-08869-4 (2025)."},{"key":"2619_CR43","doi-asserted-by":"crossref","unstructured":"Schemmer, M., K\u00fchl, N., Benz, C., Bartos, A. & Satzger, G. Appropriate reliance on AI advice: Conceptualization and the effect of explanations. In Proceedings of the 28th International Conference on Intelligent User Interfaces, 410\u2013422 http:\/\/arxiv.org\/abs\/2302.02187 (2023).","DOI":"10.1145\/3581641.3584066"},{"key":"2619_CR44","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1146\/annurev.psych.49.1.447","volume":"49","author":"BA Mellers","year":"1998","unstructured":"Mellers, B. A., Schwartz, A. & Cooke, A. D. Judgment and decision making. Annual Review of Psychology 49, 447\u2013477 (1998).","journal-title":"Annual Review of Psychology"},{"key":"2619_CR45","first-page":"330","volume":"4","author":"B Fischhoff","year":"1978","unstructured":"Fischhoff, B., Slovic, P. & Lichtenstein, S. Fault trees: Sensitivity of estimated failure probabilities to problem representation. Journal of Experimental Psychology: Human Perception and Performance 4, 330\u2013344 (1978).","journal-title":"Journal of Experimental Psychology: Human Perception and Performance"},{"key":"2619_CR46","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1038\/s44159-024-00392-z","volume":"4","author":"S Feuerriegel","year":"2025","unstructured":"Feuerriegel, S. et al. Using natural language processing to analyse text data in behavioural science. Nature Reviews Psychology 4, 96\u2013111 (2025).","journal-title":"Nature Reviews Psychology"},{"key":"2619_CR47","doi-asserted-by":"publisher","first-page":"2629","DOI":"10.1007\/s10439-023-03272-4","volume":"51","author":"L Giray","year":"2023","unstructured":"Giray, L. Prompt engineering with ChatGPT: a guide for academic writers. Annals of Biomedical Engineering 51, 2629\u20132633 (2023).","journal-title":"Annals of Biomedical Engineering"},{"key":"2619_CR48","unstructured":"Nori, H. et al. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv:2311.16452 (2023)."},{"key":"2619_CR49","doi-asserted-by":"publisher","first-page":"e49","DOI":"10.3399\/bjgp15X683161","volume":"65","author":"O Kostopoulou","year":"2015","unstructured":"Kostopoulou, O. et al. Early diagnostic suggestions improve accuracy of GPs: a randomised controlled trial using computer-simulated patients. British Journal of General Practice 65, e49\u2013e54 (2015).","journal-title":"British Journal of General Practice"},{"key":"2619_CR50","doi-asserted-by":"publisher","first-page":"1833","DOI":"10.1038\/s41562-023-01721-7","volume":"7","author":"M Ghassemi","year":"2023","unstructured":"Ghassemi, M. Presentation matters for AI-generated clinical advice. Nature Human Behaviour 7, 1833\u20131835 (2023).","journal-title":"Nature Human Behaviour"},{"key":"2619_CR51","doi-asserted-by":"publisher","first-page":"3098","DOI":"10.1038\/s41591-024-03180-7","volume":"30","author":"M Reis","year":"2024","unstructured":"Reis, M., Reis, F. & Kunde, W. Influence of believed AI involvement on the perception of digital medical advice. Nature Medicine 30, 3098\u20133100 (2024).","journal-title":"Nature Medicine"},{"key":"2619_CR52","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-023-00955-z","volume":"6","author":"M Nagendran","year":"2023","unstructured":"Nagendran, M., Festor, P., Komorowski, M., Gordon, A. C. & Faisal, A. A. Quantifying the impact of AI recommendations with explanations on prescription decision making. npj Digital Medicine 6, 206 (2023).","journal-title":"npj Digital Medicine"},{"key":"2619_CR53","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-023-28633-w","volume":"13","author":"S Gaube","year":"2023","unstructured":"Gaube, S. et al. Non-task expert physicians benefit from correct explainable AI advice when reviewing X-rays. Scientific Reports 13, 1383 (2023).","journal-title":"Scientific Reports"},{"key":"2619_CR54","doi-asserted-by":"publisher","first-page":"2275","DOI":"10.1001\/jama.2023.22295","volume":"330","author":"S Jabbour","year":"2023","unstructured":"Jabbour, S. et al. Measuring the impact of AI in the diagnosis of hospitalized patients: a randomized clinical vignette survey study. JAMA 330, 2275\u20132284 (2023).","journal-title":"JAMA"},{"key":"2619_CR55","doi-asserted-by":"publisher","first-page":"1229","DOI":"10.1038\/s41591-020-0942-0","volume":"26","author":"P Tschandl","year":"2020","unstructured":"Tschandl, P. et al. Human-computer collaboration for skin cancer recognition. Nature Medicine 26, 1229\u20131234 (2020).","journal-title":"Nature Medicine"},{"key":"2619_CR56","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-024-82501-9","volume":"14","author":"J Senoner","year":"2024","unstructured":"Senoner, J., Schallmoser, S., Kratzwald, B., Feuerriegel, S. & Netland, T. Explainable AI improves task performance in human-AI collaboration. Scientific Reports 14, 31150 (2024).","journal-title":"Scientific Reports"},{"key":"2619_CR57","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1148\/rg.2018170107","volume":"38","author":"LP Busby","year":"2018","unstructured":"Busby, L. P., Courtier, J. L. & Glastonbury, C. M. Bias in radiology: the how and why of misses and misinterpretations. RadioGraphics 38, 236\u2013247 (2018).","journal-title":"RadioGraphics"},{"key":"2619_CR58","doi-asserted-by":"publisher","first-page":"611","DOI":"10.2214\/AJR.12.10375","volume":"201","author":"CS Lee","year":"2013","unstructured":"Lee, C. S., Nagy, P. G., Weaver, S. J. & Newman-Toker, D. E. Cognitive and system factors contributing to diagnostic errors in radiology. American Journal of Roentgenology 201, 611\u2013617 (2013).","journal-title":"American Journal of Roentgenology"},{"key":"2619_CR59","doi-asserted-by":"publisher","first-page":"2176","DOI":"10.1038\/s41591-021-01595-0","volume":"27","author":"L Seyyed-Kalantari","year":"2021","unstructured":"Seyyed-Kalantari, L., Zhang, H., McDermott, M. B., Chen, I. Y. & Ghassemi, M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature Medicine 27, 2176\u20132182 (2021).","journal-title":"Nature Medicine"},{"key":"2619_CR60","doi-asserted-by":"publisher","first-page":"AIcs2400639","DOI":"10.1056\/AIcs2400639","volume":"1","author":"J Wang","year":"2024","unstructured":"Wang, J. & Redelmeier, D. A. Cognitive biases and artificial intelligence. NEJM AI 1, AIcs2400639 (2024).","journal-title":"NEJM AI"},{"key":"2619_CR61","doi-asserted-by":"crossref","unstructured":"Madsen, A., Chandar, S. & Reddy, S. Are self-explanations from large language models faithful? In Findings of the AC\u00d6, 295\u2013337 (2024).","DOI":"10.18653\/v1\/2024.findings-acl.19"},{"key":"2619_CR62","doi-asserted-by":"publisher","first-page":"e2325000","DOI":"10.1001\/jamanetworkopen.2023.25000","volume":"6","author":"Y-F Shea","year":"2023","unstructured":"Shea, Y.-F., Lee, C. M. Y., Ip, W. C. T., Luk, D. W. A. & Wong, S. S. W. Use of GPT-4 to analyze medical records of patients with extensive investigations and delayed diagnosis. JAMA Network Open 6, e2325000 (2023).","journal-title":"JAMA Network Open"},{"key":"2619_CR63","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1001\/jamainternmed.2024.0295","volume":"184","author":"S Cabral","year":"2024","unstructured":"Cabral, S. et al. Clinical reasoning of a generative artificial intelligence model compared with physicians. JAMA Internal Medicine 184, 581\u2013583 (2024).","journal-title":"JAMA Internal Medicine"},{"key":"2619_CR64","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-024-01185-7","volume":"7","author":"Q Jin","year":"2024","unstructured":"Jin, Q. et al. Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine. npj Digital Medicine 7, 190 (2024).","journal-title":"npj Digital Medicine"},{"key":"2619_CR65","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-024-01208-3","volume":"7","author":"R Kaczmarczyk","year":"2024","unstructured":"Kaczmarczyk, R., Wilhelm, T. I., Martin, R. & Roos, J. Evaluating multimodal AI in medical diagnostics. npj Digital Medicine 7, 1\u20135 (2024).","journal-title":"npj Digital Medicine"},{"key":"2619_CR66","unstructured":"Saab, K. et al. Capabilities of Gemini models in medicine http:\/\/arxiv.org\/abs\/2404.18416 (2024)."},{"key":"2619_CR67","doi-asserted-by":"publisher","first-page":"e241668","DOI":"10.1148\/radiol.241668","volume":"313","author":"PS Suh","year":"2024","unstructured":"Suh, P. S. et al. Comparing large language model and human reader accuracy with New England Journal of Medicine Image Challenge case image inputs. Radiology 313, e241668 (2024).","journal-title":"Radiology"},{"key":"2619_CR68","unstructured":"Trinh, T. H. & Le, Q. V. A simple method for commonsense reasoning https:\/\/arxiv.org\/abs\/1806.02847 (2018)."},{"key":"2619_CR69","unstructured":"Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019)."},{"key":"2619_CR70","unstructured":"Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems, vol. 33, 1877\u20131901 https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf (2020)."},{"key":"2619_CR71","doi-asserted-by":"publisher","first-page":"e47532","DOI":"10.2196\/47532","volume":"9","author":"N Ito","year":"2023","unstructured":"Ito, N. et al. The accuracy and potential racial and ethnic biases of GPT-4 in the diagnosis and triage of health conditions: Evaluation study. JMIR Medical Education 9, e47532 (2023).","journal-title":"JMIR Medical Education"},{"key":"2619_CR72","unstructured":"The New England Journal of Medicine: Image Challenge https:\/\/www.nejm.org\/image-challenge (2025)."},{"key":"2619_CR73","first-page":"46","volume":"13","author":"JW Ratcliff","year":"1988","unstructured":"Ratcliff, J. W., Metzener, D. E. et al. Pattern matching: The gestalt approach. Dr. Dobb\u2019s Journal 13, 46 (1988).","journal-title":"Dr. Dobb\u2019s Journal"},{"key":"2619_CR74","doi-asserted-by":"crossref","unstructured":"Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: A Method for Automatic Evaluation of Machine Translation. In Isabelle, P., Charniak, E. & Lin, D. (eds.) Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 311\u2013318 (Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002).","DOI":"10.3115\/1073083.1073135"},{"key":"2619_CR75","unstructured":"MSI-ACI. Market research service. https:\/\/site.msi-aci.com\/ Accessed: February 9, 2026 (2025)."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-026-02619-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-026-02619-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-026-02619-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T18:02:07Z","timestamp":1776967327000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-026-02619-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,23]]},"references-count":75,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,12]]}},"alternative-id":["2619"],"URL":"https:\/\/doi.org\/10.1038\/s41746-026-02619-0","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,23]]},"assertion":[{"value":"9 December 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 March 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 April 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"333"}}