{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T03:38:00Z","timestamp":1778816280934,"version":"3.51.4"},"reference-count":121,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,3,22]],"date-time":"2025-03-22T00:00:00Z","timestamp":1742601600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,3,22]],"date-time":"2025-03-22T00:00:00Z","timestamp":1742601600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    While generative artificial intelligence (AI) has shown potential in medical diagnostics, comprehensive evaluation of its diagnostic performance and comparison with physicians has not been extensively explored. We conducted a systematic review and meta-analysis of studies validating generative AI models for diagnostic tasks published between June 2018 and June 2024. Analysis of 83 studies revealed an overall diagnostic accuracy of 52.1%. No significant performance difference was found between AI models and physicians overall (\n                    <jats:italic>p<\/jats:italic>\n                    \u2009=\u20090.10) or non-expert physicians (\n                    <jats:italic>p<\/jats:italic>\n                    \u2009=\u20090.93). However, AI models performed significantly worse than expert physicians (\n                    <jats:italic>p<\/jats:italic>\n                    \u2009=\u20090.007). Several models demonstrated slightly higher performance compared to non-experts, although the differences were not significant. Generative AI demonstrates promising diagnostic capabilities with accuracy varying by model. Although it has not yet achieved expert-level reliability, these findings suggest potential for enhancing healthcare delivery and medical education when implemented with appropriate understanding of its limitations.\n                  <\/jats:p>","DOI":"10.1038\/s41746-025-01543-z","type":"journal-article","created":{"date-parts":[[2025,3,22]],"date-time":"2025-03-22T14:46:09Z","timestamp":1742654769000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":104,"title":["A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians"],"prefix":"10.1038","volume":"8","author":[{"given":"Hirotaka","family":"Takita","sequence":"first","affiliation":[]},{"given":"Daijiro","family":"Kabata","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7268-8313","authenticated-orcid":false,"given":"Shannon L.","family":"Walston","sequence":"additional","affiliation":[]},{"given":"Hiroyuki","family":"Tatekawa","sequence":"additional","affiliation":[]},{"given":"Kenichi","family":"Saito","sequence":"additional","affiliation":[]},{"given":"Yasushi","family":"Tsujimoto","sequence":"additional","affiliation":[]},{"given":"Yukio","family":"Miki","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3878-3616","authenticated-orcid":false,"given":"Daiju","family":"Ueda","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,22]]},"reference":[{"key":"1543_CR1","unstructured":"Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. Improving Language Understanding by Generative Pre-training. https:\/\/www.mikecaptain.com\/resources\/pdf\/GPT-1.pdf (2018)."},{"key":"1543_CR2","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877\u20131901 (2020).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"1543_CR3","doi-asserted-by":"publisher","unstructured":"OpenAI et al. GPT-4 technical report. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2303.08774 (2023).","DOI":"10.48550\/arXiv.2303.08774"},{"key":"1543_CR4","doi-asserted-by":"publisher","unstructured":"Touvron, H. et al. LLaMA: open and efficient foundation language models. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2302.13971 (2023).","DOI":"10.48550\/arXiv.2302.13971"},{"key":"1543_CR5","doi-asserted-by":"publisher","unstructured":"Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2307.09288 (2023).","DOI":"10.48550\/arXiv.2307.09288"},{"key":"1543_CR6","first-page":"1","volume":"24","author":"A Chowdhery","year":"2023","unstructured":"Chowdhery, A. et al. PaLM: scaling language modeling with pathways. J. Mach. Learn. Res. 24, 1\u2013113 (2023).","journal-title":"J. Mach. Learn. Res."},{"key":"1543_CR7","doi-asserted-by":"publisher","unstructured":"Anil, R. et al. PaLM 2 technical report. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2305.10403 (2023).","DOI":"10.48550\/arXiv.2305.10403"},{"key":"1543_CR8","doi-asserted-by":"publisher","unstructured":"Thoppilan, R. et al. LaMDA: language models for dialog applications. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2201.08239 (2022).","DOI":"10.48550\/arXiv.2201.08239"},{"key":"1543_CR9","doi-asserted-by":"publisher","first-page":"1930","DOI":"10.1038\/s41591-023-02448-8","volume":"29","author":"AJ Thirunavukarasu","year":"2023","unstructured":"Thirunavukarasu, A. J. et al. Large language models in medicine. Nat. Med. 29, 1930\u20131940 (2023).","journal-title":"Nat. Med."},{"key":"1543_CR10","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","volume":"620","author":"K Singhal","year":"2023","unstructured":"Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172\u2013180 (2023).","journal-title":"Nature"},{"key":"1543_CR11","doi-asserted-by":"publisher","DOI":"10.1148\/radiol.231040","volume":"308","author":"D Ueda","year":"2023","unstructured":"Ueda, D. et al. ChatGPT\u2019s diagnostic performance from patient history and imaging findings on the diagnosis please quizzes. Radiology 308, e231040 (2023).","journal-title":"Radiology"},{"key":"1543_CR12","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1001\/jama.2023.8288","volume":"330","author":"Z Kanjee","year":"2023","unstructured":"Kanjee, Z., Crowe, B. & Rodman, A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 330, 78\u201380 (2023).","journal-title":"JAMA"},{"key":"1543_CR13","doi-asserted-by":"publisher","first-page":"1119","DOI":"10.1016\/j.amjmed.2023.08.003","volume":"136","author":"T Hirosawa","year":"2023","unstructured":"Hirosawa, T., Mizuta, K., Harada, Y. & Shimizu, T. Comparative evaluation of diagnostic accuracy between Google Bard and physicians. Am. J. Med. 136, 1119\u20131123.e18 (2023).","journal-title":"Am. J. Med."},{"key":"1543_CR14","doi-asserted-by":"publisher","first-page":"e2325000","DOI":"10.1001\/jamanetworkopen.2023.25000","volume":"6","author":"Y-F Shea","year":"2023","unstructured":"Shea, Y.-F., Lee, C. M. Y., Ip, W. C. T., Luk, D. W. A. & Wong, S. S. W. Use of GPT-4 to analyze medical records of patients with extensive investigations and delayed diagnosis. JAMA Netw. Open 6, e2325000 (2023).","journal-title":"JAMA Netw. Open"},{"key":"1543_CR15","doi-asserted-by":"publisher","first-page":"4687","DOI":"10.1007\/s00405-023-08135-1","volume":"280","author":"J Chee","year":"2023","unstructured":"Chee, J., Kwa, E. D. & Goh, X. Vertigo, likely peripheral\u2019: the dizzying rise of ChatGPT. Eur. Arch. Otorhinolaryngol. 280, 4687\u20134689 (2023).","journal-title":"Eur. Arch. Otorhinolaryngol."},{"key":"1543_CR16","doi-asserted-by":"publisher","unstructured":"Lyons, R. J., Arepalli, S. R., Fromal, O., Choi, J. D. & Jain, N. Artificial intelligence chatbot performance in triage of ophthalmic conditions. Can. J. Ophthalmol. https:\/\/doi.org\/10.1016\/j.jcjo.2023.07.016 (2023).","DOI":"10.1016\/j.jcjo.2023.07.016"},{"key":"1543_CR17","doi-asserted-by":"publisher","unstructured":"Benoit, J. R. A. ChatGPT for clinical vignette generation, revision, and evaluation. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.02.04.23285478 (2023).","DOI":"10.1101\/2023.02.04.23285478"},{"key":"1543_CR18","doi-asserted-by":"publisher","first-page":"e48808","DOI":"10.2196\/48808","volume":"11","author":"T Hirosawa","year":"2023","unstructured":"Hirosawa, T. et al. ChatGPT-generated differential diagnosis lists for complex case-derived clinical vignettes: diagnostic accuracy evaluation. JMIR Med. Inf. 11, e48808 (2023).","journal-title":"JMIR Med. Inf."},{"key":"1543_CR19","doi-asserted-by":"publisher","first-page":"3378","DOI":"10.3390\/ijerph20043378","volume":"20","author":"T Hirosawa","year":"2023","unstructured":"Hirosawa, T. et al. Diagnostic accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 chatbot for clinical vignettes with common chief complaints: a pilot study. Int. J. Environ. Res. Public Health 20, 3378 (2023).","journal-title":"Int. J. Environ. Res. Public Health"},{"key":"1543_CR20","doi-asserted-by":"publisher","first-page":"115351","DOI":"10.1016\/j.psychres.2023.115351","volume":"327","author":"Q Wei","year":"2023","unstructured":"Wei, Q., Cui, Y., Wei, B., Cheng, Q. & Xu, X. Evaluating the performance of ChatGPT in differential diagnosis of neurodevelopmental disorders: a pediatricians-machine comparison. Psychiatry Res. 327, 115351 (2023).","journal-title":"Psychiatry Res."},{"key":"1543_CR21","doi-asserted-by":"publisher","first-page":"310","DOI":"10.1159\/000533177","volume":"88","author":"L Allahqoli","year":"2023","unstructured":"Allahqoli, L., Ghiasvand, M. M., Mazidimoradi, A., Salehiniya, H. & Alkatout, I. Diagnostic and management performance of ChatGPT in obstetrics and gynecology. Gynecol. Obstet. Invest. 88, 310\u2013313 (2023).","journal-title":"Gynecol. Obstet. Invest."},{"key":"1543_CR22","doi-asserted-by":"publisher","first-page":"2283","DOI":"10.14309\/ajg.0000000000002483","volume":"118","author":"A Levartovsky","year":"2023","unstructured":"Levartovsky, A., Ben-Horin, S., Kopylov, U., Klang, E. & Barash, Y. Towards AI-augmented clinical decision-making: an examination of ChatGPT\u2019s utility in acute ulcerative colitis presentations. Am. J. Gastroenterol. 118, 2283\u20132289 (2023).","journal-title":"Am. J. Gastroenterol."},{"key":"1543_CR23","doi-asserted-by":"publisher","DOI":"10.1007\/s10916-023-02019-x","volume":"47","author":"S Bushuven","year":"2023","unstructured":"Bushuven, S. et al. \u2018ChatGPT, can you help me save my child\u2019s life?\u2019\u2014diagnostic accuracy and supportive capabilities to lay rescuers by ChatGPT in prehospital basic life support and paediatric advanced life support cases\u2014an in-silico analysis. J. Med. Syst. 47, 123 (2023).","journal-title":"J. Med. Syst."},{"key":"1543_CR24","doi-asserted-by":"publisher","unstructured":"Knebel, D. et al. Assessment of ChatGPT in the prehospital management of ophthalmological emergencies\u2014an analysis of 10 fictional case vignettes. Klin. Monbl. Augenheilkd. https:\/\/doi.org\/10.1055\/a-2149-0447 (2023).","DOI":"10.1055\/a-2149-0447"},{"key":"1543_CR25","doi-asserted-by":"publisher","first-page":"100213","DOI":"10.1016\/j.jtauto.2023.100213","volume":"7","author":"J Pillai","year":"2023","unstructured":"Pillai, J. & Pillai, K. Accuracy of generative artificial intelligence models in differential diagnoses of familial Mediterranean fever and deficiency of Interleukin-1 receptor antagonist. J. Transl. Autoimmun. 7, 100213 (2023).","journal-title":"J. Transl. Autoimmun."},{"key":"1543_CR26","doi-asserted-by":"publisher","first-page":"e47532","DOI":"10.2196\/47532","volume":"9","author":"N Ito","year":"2023","unstructured":"Ito, N. et al. The accuracy and potential racial and ethnic biases of GPT-4 in the diagnosis and triage of health conditions: evaluation study. JMIR Med. Educ. 9, e47532 (2023).","journal-title":"JMIR Med. Educ."},{"key":"1543_CR27","doi-asserted-by":"publisher","unstructured":"Sorin, V. et al. GPT-4 multimodal analysis on ophthalmology clinical cases including text and images. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.11.24.23298953 (2023).","DOI":"10.1101\/2023.11.24.23298953"},{"key":"1543_CR28","doi-asserted-by":"publisher","unstructured":"Madadi, Y. et al. ChatGPT assisting diagnosis of neuro-ophthalmology diseases based on case reports. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.09.13.23295508 (2023).","DOI":"10.1101\/2023.09.13.23295508"},{"key":"1543_CR29","doi-asserted-by":"publisher","unstructured":"Schubert, M. C., Lasotta, M., Sahm, F., Wick, W. & Venkataramani, V. Evaluating the multimodal capabilities of generative AI in complex clinical diagnostics. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.11.01.23297938 (2023).","DOI":"10.1101\/2023.11.01.23297938"},{"key":"1543_CR30","doi-asserted-by":"publisher","unstructured":"Kiyohara, Y. et al. Large language models to differentiate vasospastic angina using patient information. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.06.26.23291913 (2023).","DOI":"10.1101\/2023.06.26.23291913"},{"key":"1543_CR31","first-page":"e47594","volume":"15","author":"I Sultan","year":"2023","unstructured":"Sultan, I. et al. Using ChatGPT to predict cancer predisposition genes: a promising tool for pediatric oncologists. Cureus 15, e47594 (2023).","journal-title":"Cureus"},{"key":"1543_CR32","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1007\/s00234-023-03252-4","volume":"66","author":"D Horiuchi","year":"2023","unstructured":"Horiuchi, D. et al. Accuracy of ChatGPT generated diagnosis from patient\u2019s medical history and imaging findings in neuroradiology cases. Neuroradiology 66, 73\u201379 (2023).","journal-title":"Neuroradiology"},{"key":"1543_CR33","doi-asserted-by":"publisher","unstructured":"Stoneham, S., Livesey, A., Cooper, H. & Mitchell, C. Chat GPT vs clinician: challenging the diagnostic capabilities of A.I. In dermatology. Clin. Exp. Dermatol. https:\/\/doi.org\/10.1093\/ced\/llad402 (2023).","DOI":"10.1093\/ced\/llad402"},{"key":"1543_CR34","doi-asserted-by":"publisher","unstructured":"Rundle, C. W., Szeto, M. D., Presley, C. L., Shahwan, K. T. & Carr, D. R. Analysis of ChatGPT generated differential diagnoses in response to physical exam findings for benign and malignant cutaneous neoplasms. J. Am. Acad. Dermatol. https:\/\/doi.org\/10.1016\/j.jaad.2023.10.040 (2023).","DOI":"10.1016\/j.jaad.2023.10.040"},{"key":"1543_CR35","doi-asserted-by":"crossref","unstructured":"Rojas-Carabali, W. et al. Chatbots Vs. human experts: evaluating diagnostic performance of chatbots in uveitis and the perspectives on AI adoption in ophthalmology. Ocul. Immunol. Inflamm. 32, 1591\u20131598 (2024).","DOI":"10.1080\/09273948.2023.2266730"},{"key":"1543_CR36","doi-asserted-by":"publisher","first-page":"e49995","DOI":"10.2196\/49995","volume":"11","author":"H Fraser","year":"2023","unstructured":"Fraser, H. et al. Comparison of diagnostic and triage accuracy of Ada Health and WebMD symptom checkers, ChatGPT, and physicians for patients in an emergency department: clinical data analysis study. JMIR Mhealth Uhealth 11, e49995 (2023).","journal-title":"JMIR Mhealth Uhealth"},{"key":"1543_CR37","doi-asserted-by":"publisher","unstructured":"Krusche, M., Callhoff, J., Knitza, J. & Ruffer, N. Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4. Rheumatol. Int. https:\/\/doi.org\/10.1007\/s00296-023-05464-6 (2023).","DOI":"10.1007\/s00296-023-05464-6"},{"key":"1543_CR38","doi-asserted-by":"publisher","first-page":"120804","DOI":"10.1016\/j.jns.2023.120804","volume":"453","author":"K Galetta","year":"2023","unstructured":"Galetta, K. & Meltzer, E. Does GPT-4 have neurophobia? Localization and diagnostic accuracy of an artificial intelligence-powered chatbot in clinical vignettes. J. Neurol. Sci. 453, 120804 (2023).","journal-title":"J. Neurol. Sci."},{"key":"1543_CR39","doi-asserted-by":"publisher","first-page":"3121","DOI":"10.1007\/s40123-023-00805-x","volume":"12","author":"M Delsoz","year":"2023","unstructured":"Delsoz, M. et al. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case reports. Ophthalmol. Ther. 12, 3121\u20133132 (2023).","journal-title":"Ophthalmol. Ther."},{"key":"1543_CR40","doi-asserted-by":"publisher","first-page":"3395","DOI":"10.1007\/s40123-023-00789-8","volume":"12","author":"X Hu","year":"2023","unstructured":"Hu, X. et al. What can GPT-4 do for diagnosing rare eye diseases? A pilot study. Ophthalmol. Ther. 12, 3395\u20133402 (2023).","journal-title":"Ophthalmol. Ther."},{"key":"1543_CR41","doi-asserted-by":"publisher","first-page":"2407","DOI":"10.1007\/s00266-023-03538-1","volume":"47","author":"J Abi-Rafeh","year":"2023","unstructured":"Abi-Rafeh, J., Hanna, S., Bassiri-Tehrani, B., Kazan, R. & Nahai, F. Complications following facelift and neck lift: implementation and assessment of large language model and artificial intelligence (ChatGPT) performance across 16 simulated patient presentations. Aesthetic Plast. Surg. 47, 2407\u20132414 (2023).","journal-title":"Aesthetic Plast. Surg."},{"key":"1543_CR42","doi-asserted-by":"crossref","unstructured":"Koga, S., Martin, N. B. & Dickson, D. W. Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathol. 34, e13207 (2023).","DOI":"10.1111\/bpa.13207"},{"key":"1543_CR43","doi-asserted-by":"publisher","first-page":"2569","DOI":"10.1007\/s00345-023-04539-0","volume":"41","author":"Y Xv","year":"2023","unstructured":"Xv, Y., Peng, C., Wei, Z., Liao, F. & Xiao, M. Can Chat-GPT a substitute for urological resident physician in diagnosing diseases?: a preliminary conclusion from an exploratory investigation. World J. Urol. 41, 2569\u20132571 (2023).","journal-title":"World J. Urol."},{"key":"1543_CR44","doi-asserted-by":"publisher","unstructured":"Senthujan, S. M. et al. GPT-4V(ision) unsuitable for clinical care and education: a clinician-evaluated assessment. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.11.15.23298575 (2023).","DOI":"10.1101\/2023.11.15.23298575"},{"key":"1543_CR45","doi-asserted-by":"crossref","unstructured":"Mori, Y., Izumiyama, T., Kanabuchi, R., Mori, N. & Aizawa, T. Large language model may assist diagnosis of SAPHO syndrome by bone scintigraphy. Mod. Rheumatol. 34, 1043\u20131046 (2024).","DOI":"10.1093\/mr\/road115"},{"key":"1543_CR46","doi-asserted-by":"publisher","first-page":"2345","DOI":"10.36740\/WLek202311101","volume":"76","author":"Y Mykhalko","year":"2023","unstructured":"Mykhalko, Y., Kish, P., Rubtsova, Y., Kutsyn, O. & Koval, V. From text to diagnose: ChatGPT\u2019S efficacy in medical decision-making. Wiad. Lek. 76, 2345\u20132350 (2023).","journal-title":"Wiad. Lek."},{"key":"1543_CR47","first-page":"439","volume":"159","author":"CA Andrade-Castellanos","year":"2023","unstructured":"Andrade-Castellanos, C. A., Paz, M. T. T.l.a. & Farf\u00e1n-Flores, P. E. Accuracy of ChatGPT for the diagnosis of clinical entities in the field of internal medicine. Gac. Med. Mex. 159, 439\u2013442 (2023).","journal-title":"Gac. Med. Mex."},{"key":"1543_CR48","doi-asserted-by":"publisher","first-page":"2534","DOI":"10.1016\/j.jseint.2023.07.018","volume":"7","author":"M Daher","year":"2023","unstructured":"Daher, M. et al. Breaking barriers: can ChatGPT compete with a shoulder and elbow specialist in diagnosis and management? JSES Int. 7, 2534\u20132541 (2023).","journal-title":"JSES Int."},{"key":"1543_CR49","volume":"15","author":"PP Suthar","year":"2023","unstructured":"Suthar, P. P., Kounsal, A., Chhetri, L., Saini, D. & Dua, S. G. Artificial intelligence (AI) in radiology: a deep dive into ChatGPT 4.0\u2019s accuracy with the American Journal of Neuroradiology\u2019s (AJNR) \u2018Case of the Month\u2019. Cureus 15, e43958 (2023).","journal-title":"Cureus"},{"key":"1543_CR50","first-page":"1","volume":"15","author":"T Nakaura","year":"2023","unstructured":"Nakaura, T. et al. Preliminary assessment of automated radiology report generation with generative pre-trained transformers: comparing results to radiologist-generated reports. Jpn. J. Radiol. 15, 1\u201311 (2023).","journal-title":"Jpn. J. Radiol."},{"key":"1543_CR51","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1016\/j.annemergmed.2023.08.003","volume":"83","author":"HT Berg","year":"2024","unstructured":"Berg, H. T. et al. ChatGPT and generating a differential diagnosis early in an emergency department presentation. Ann. Emerg. Med. 83, 83\u201386 (2024).","journal-title":"Ann. Emerg. Med."},{"key":"1543_CR52","doi-asserted-by":"publisher","first-page":"3717","DOI":"10.3390\/cancers15143717","volume":"15","author":"G Gebrael","year":"2023","unstructured":"Gebrael, G. et al. Enhancing triage efficiency and accuracy in emergency rooms for patients with metastatic prostate cancer: a retrospective analysis of artificial intelligence-assisted triage using ChatGPT 4.0. Cancers 15, 3717 (2023).","journal-title":"Cancers"},{"key":"1543_CR53","doi-asserted-by":"publisher","first-page":"e547","DOI":"10.1111\/ijd.16746","volume":"62","author":"A Ravipati","year":"2023","unstructured":"Ravipati, A., Pradeep, T. & Elman, S. A. The role of artificial intelligence in dermatology: the promising but limited accuracy of ChatGPT in diagnosing clinical scenarios. Int. J. Dermatol. 62, e547\u2013e548 (2023).","journal-title":"Int. J. Dermatol."},{"key":"1543_CR54","doi-asserted-by":"publisher","first-page":"e58758","DOI":"10.2196\/58758","volume":"10","author":"K Shikino","year":"2024","unstructured":"Shikino, K. et al. Evaluation of ChatGPT-generated differential diagnosis for common diseases with atypical presentation: descriptive research. JMIR Med. Educ. 10, e58758 (2024).","journal-title":"JMIR Med. Educ."},{"key":"1543_CR55","doi-asserted-by":"crossref","unstructured":"Horiuchi, D. et al. Comparing the diagnostic performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists in challenging neuroradiology cases. Clin. Neuroradiol. 34, 779\u2013787 (2024).","DOI":"10.1007\/s00062-024-01426-y"},{"key":"1543_CR56","doi-asserted-by":"publisher","first-page":"e1083","DOI":"10.1016\/j.wneu.2024.05.052","volume":"187","author":"RP Kumar","year":"2024","unstructured":"Kumar, R. P. et al. Can artificial intelligence mitigate missed diagnoses by generating differential diagnoses for neurosurgeons? World Neurosurg. 187, e1083\u2013e1088 (2024).","journal-title":"World Neurosurg."},{"key":"1543_CR57","doi-asserted-by":"publisher","DOI":"10.2196\/53724","volume":"26","author":"WHK Chiu","year":"2024","unstructured":"Chiu, W. H. K. et al. Evaluating the diagnostic performance of large language models on complex multimodal medical cases. J. Med. Internet Res. 26, e53724 (2024).","journal-title":"J. Med. Internet Res."},{"key":"1543_CR58","first-page":"1506","volume":"45","author":"T Kikuchi","year":"2024","unstructured":"Kikuchi, T. et al. Toward improved radiologic diagnostics: investigating the utility and limitations of GPT-3.5 Turbo and GPT-4 with quiz cases. AJNR Am. J. Neuroradiol. 45, 1506\u20131511 (2024).","journal-title":"AJNR Am. J. Neuroradiol."},{"key":"1543_CR59","first-page":"250","volume":"11","author":"JM Bridges","year":"2024","unstructured":"Bridges, J. M. Computerized diagnostic decision support systems\u2014a comparative performance study of Isabel Pro vs. ChatGPT4. Acta Radiol. Diagn. 11, 250\u2013258 (2024).","journal-title":"Acta Radiol. Diagn."},{"key":"1543_CR60","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-024-58760-x","volume":"14","author":"A Shieh","year":"2024","unstructured":"Shieh, A. et al. Assessing ChatGPT 4.0\u2019s test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports. Sci. Rep. 14, 1\u20138 (2024).","journal-title":"Sci. Rep."},{"key":"1543_CR61","doi-asserted-by":"publisher","first-page":"3997","DOI":"10.1002\/lary.31434","volume":"134","author":"A Warrier","year":"2024","unstructured":"Warrier, A., Singh, R., Haleem, A., Zaki, H. & Eloy, J. A. The comparative diagnostic capability of large language models in otolaryngology. Laryngoscope 134, 3997\u20134002 (2024).","journal-title":"Laryngoscope"},{"key":"1543_CR62","doi-asserted-by":"publisher","first-page":"1320","DOI":"10.1001\/jama.2023.27861","volume":"331","author":"T Han","year":"2024","unstructured":"Han, T. et al. Comparative analysis of multimodal large language model performance on clinical vignette questions. JAMA 331, 1320\u20131321 (2024).","journal-title":"JAMA"},{"key":"1543_CR63","doi-asserted-by":"publisher","first-page":"1398","DOI":"10.1136\/bjo-2023-325053","volume":"108","author":"D Milad","year":"2024","unstructured":"Milad, D. et al. Assessing the medical reasoning skills of GPT-4 in complex ophthalmology cases. Br. J. Ophthalmol. 108, 1398\u20131405 (2024).","journal-title":"Br. J. Ophthalmol."},{"key":"1543_CR64","doi-asserted-by":"publisher","first-page":"e51391","DOI":"10.2196\/51391","volume":"10","author":"T Abdullahi","year":"2024","unstructured":"Abdullahi, T., Singh, R. & Eickhoff, C. Learning to make rare and complex diagnoses with generative AI assistance: qualitative study of popular large language models. JMIR Med. Educ. 10, e51391 (2024).","journal-title":"JMIR Med. Educ."},{"key":"1543_CR65","doi-asserted-by":"publisher","unstructured":"Tenner, Z. M., Cottone, M. & Chavez, M. Harnessing the open access version of ChatGPT for enhanced clinical opinions. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.08.23.23294478 (2023).","DOI":"10.1101\/2023.08.23.23294478"},{"key":"1543_CR66","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1097\/JCMA.0000000000001064","volume":"87","author":"DWA Luk","year":"2024","unstructured":"Luk, D. W. A., Ip, W. C. T. & Shea, Y.-F. Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties. J. Chin. Med. Assoc. 87, 259 (2024).","journal-title":"J. Chin. Med. Assoc."},{"key":"1543_CR67","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-024-01010-1","volume":"7","author":"T Savage","year":"2024","unstructured":"Savage, T., Nayak, A., Gallo, R., Rangan, E. & Chen, J. H. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. NPJ Digit. Med. 7, 1\u20137 (2024).","journal-title":"NPJ Digit. Med."},{"key":"1543_CR68","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1007\/s43678-023-00616-w","volume":"26","author":"JM Franc","year":"2024","unstructured":"Franc, J. M., Cheng, L., Hart, A., Hata, R. & Hertelendy, A. Repeatability, reproducibility, and diagnostic accuracy of a commercial large language model (ChatGPT) to perform emergency department triage using the Canadian triage and acuity scale. CJEM 26, 40\u201346 (2024).","journal-title":"CJEM"},{"key":"1543_CR69","doi-asserted-by":"publisher","first-page":"107924","DOI":"10.1016\/j.compbiomed.2024.107924","volume":"169","author":"J Yang","year":"2024","unstructured":"Yang, J. et al. RDmaster: a novel phenotype-oriented dialogue system supporting differential diagnosis of rare disease. Comput. Biol. Med. 169, 107924 (2024).","journal-title":"Comput. Biol. Med."},{"key":"1543_CR70","doi-asserted-by":"publisher","unstructured":"Reese, J. T. et al. On the limitations of large language models in clinical diagnosis. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.07.13.23292613 (2023).","DOI":"10.1101\/2023.07.13.23292613"},{"key":"1543_CR71","doi-asserted-by":"publisher","unstructured":"do Olmo, J., Logro\u00f1o, J., Masc\u00edas, C., Mart\u00ednez, M. & Isla, J. Assessing DxGPT: diagnosing rare diseases with various large language models. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2024.05.08.24307062 (2024).","DOI":"10.1101\/2024.05.08.24307062"},{"key":"1543_CR72","doi-asserted-by":"publisher","unstructured":"Cesur, T., Gunes, Y. C., Camur, E. & Da\u011fl\u0131, M. Empowering radiologists with ChatGPT-4o: comparative evaluation of large language models and radiologists in cardiac cases. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2024.06.25.24309247 (2024).","DOI":"10.1101\/2024.06.25.24309247"},{"key":"1543_CR73","doi-asserted-by":"publisher","unstructured":"Schramm, S. et al. Impact of multimodal prompt elements on diagnostic performance of GPT-4(V) in challenging brain MRI cases. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2024.03.05.24303767 (2024).","DOI":"10.1101\/2024.03.05.24303767"},{"key":"1543_CR74","doi-asserted-by":"publisher","unstructured":"Gunes, Y. C. & Cesur, T. A comparative study: diagnostic performance of ChatGPT 3.5, Google Bard, Microsoft Bing, and radiologists in thoracic radiology cases. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2024.01.18.24301495 (2024).","DOI":"10.1101\/2024.01.18.24301495"},{"key":"1543_CR75","doi-asserted-by":"publisher","unstructured":"Olshaker, H. et al. Evaluating the diagnostic performance of large language models in identifying complex multisystemic syndromes: a comparative study with radiology residents. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2024.06.05.24308335 (2024).","DOI":"10.1101\/2024.06.05.24308335"},{"key":"1543_CR76","doi-asserted-by":"publisher","DOI":"10.1177\/20552076241265215","volume":"10","author":"T Hirosawa","year":"2024","unstructured":"Hirosawa, T. et al. Diagnostic performance of generative artificial intelligences for a series of complex case reports. Digit. Health 10, 20552076241265215 (2024).","journal-title":"Digit. Health"},{"key":"1543_CR77","doi-asserted-by":"publisher","unstructured":"Mitsuyama, Y. et al. Comparative analysis of GPT-4-based ChatGPT\u2019s diagnostic performance with radiologists using real-world radiology reports of brain tumors. Eur. Radiol. https:\/\/doi.org\/10.1007\/s00330-024-11032-8 (2024).","DOI":"10.1007\/s00330-024-11032-8"},{"key":"1543_CR78","doi-asserted-by":"publisher","unstructured":"Yazaki, M. et al. Emergency patient triage improvement through a retrieval-augmented generation enhanced large-scale language model. Prehosp. Emerg. Care https:\/\/doi.org\/10.1080\/10903127.2024.2374400 (2024).","DOI":"10.1080\/10903127.2024.2374400"},{"key":"1543_CR79","doi-asserted-by":"publisher","first-page":"1732","DOI":"10.1097\/IAE.0000000000004204","volume":"44","author":"S Ghalibafan","year":"2024","unstructured":"Ghalibafan, S. et al. Applications of multimodal generative artificial intelligence in a real-world retina clinic setting. Retina 44, 1732\u20131740 (2024).","journal-title":"Retina"},{"key":"1543_CR80","doi-asserted-by":"publisher","first-page":"2613","DOI":"10.1038\/s41591-024-03097-1","volume":"30","author":"P Hager","year":"2024","unstructured":"Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 30, 2613\u20132622 (2024).","journal-title":"Nat. Med."},{"key":"1543_CR81","doi-asserted-by":"publisher","DOI":"10.1007\/s00330-024-10902-5","author":"D Horiuchi","year":"2024","unstructured":"Horiuchi, D. et al. ChatGPT\u2019s diagnostic performance based on textual vs. visual information compared to radiologists\u2019 diagnostic performance in musculoskeletal radiology. Eur. Radiol. https:\/\/doi.org\/10.1007\/s00330-024-10902-5 (2024).","journal-title":"Eur. Radiol."},{"key":"1543_CR82","doi-asserted-by":"publisher","first-page":"1380148","DOI":"10.3389\/fmed.2024.1380148","volume":"11","author":"A R\u00edos-Hoyo","year":"2024","unstructured":"R\u00edos-Hoyo, A. et al. Evaluation of large language models as a diagnostic aid for complex medical cases. Front. Med. 11, 1380148 (2024).","journal-title":"Front. Med."},{"key":"1543_CR83","doi-asserted-by":"publisher","first-page":"e59273","DOI":"10.2196\/59273","volume":"12","author":"X Liu","year":"2024","unstructured":"Liu, X. et al. Claude 3 Opus and ChatGPT With GPT-4 in dermoscopic image analysis for melanoma diagnosis: comparative performance analysis. JMIR Med. Inform. 12, e59273 (2024).","journal-title":"JMIR Med. Inform."},{"key":"1543_CR84","doi-asserted-by":"publisher","unstructured":"Sonoda, Y. et al. Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in \u2018Diagnosis Please\u2019 cases. Jpn. J. Radiol. https:\/\/doi.org\/10.1007\/s11604-024-01619-y (2024).","DOI":"10.1007\/s11604-024-01619-y"},{"key":"1543_CR85","doi-asserted-by":"crossref","unstructured":"Wada, A. et al. Optimizing GPT-4 Turbo diagnostic accuracy in neuroradiology through prompt engineering and confidence thresholds. Diagnostics 14, 1541 (2024).","DOI":"10.3390\/diagnostics14141541"},{"key":"1543_CR86","doi-asserted-by":"publisher","first-page":"104168","DOI":"10.1016\/j.ajp.2024.104168","volume":"100","author":"OK Gargari","year":"2024","unstructured":"Gargari, O. K. et al. Diagnostic accuracy of large language models in psychiatry. Asian J. Psychiatr. 100, 104168 (2024).","journal-title":"Asian J. Psychiatr."},{"key":"1543_CR87","doi-asserted-by":"publisher","DOI":"10.1016\/j.xops.2024.100556","volume":"4","author":"A Mihalache","year":"2024","unstructured":"Mihalache, A. et al. Interpretation of clinical retinal images using an artificial intelligence Chatbot. Ophthalmol. Sci. 4, 100556 (2024).","journal-title":"Ophthalmol. Sci."},{"key":"1543_CR88","doi-asserted-by":"publisher","DOI":"10.1002\/lrh2.10438","volume":"8","author":"GW Rutledge","year":"2024","unstructured":"Rutledge, G. W. Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases. Learn Health Syst. 8, e10438 (2024).","journal-title":"Learn Health Syst."},{"key":"1543_CR89","doi-asserted-by":"crossref","unstructured":"Ueda, D. et al. Evaluating GPT-4-based ChatGPT\u2019s clinical potential on the NEJM quiz. BMC Digit. Health 2, 4 (2024).","DOI":"10.1186\/s44247-023-00058-5"},{"key":"1543_CR90","doi-asserted-by":"publisher","unstructured":"Delsoz, M. et al. Performance of ChatGPT in diagnosis of corneal eye diseases. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.08.25.23294635 (2023).","DOI":"10.1101\/2023.08.25.23294635"},{"key":"1543_CR91","doi-asserted-by":"publisher","unstructured":"Brin, D. et al. Assessing GPT-4 multimodal performance in radiological image analysis. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.11.15.23298583 (2023).","DOI":"10.1101\/2023.11.15.23298583"},{"key":"1543_CR92","doi-asserted-by":"publisher","first-page":"e555","DOI":"10.1016\/S2589-7500(24)00097-9","volume":"6","author":"DM Levine","year":"2024","unstructured":"Levine, D. M. et al. The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study. Lancet Digit. Health 6, e555\u2013e561 (2024).","journal-title":"Lancet Digit. Health"},{"key":"1543_CR93","doi-asserted-by":"publisher","first-page":"e248895","DOI":"10.1001\/jamanetworkopen.2024.8895","volume":"7","author":"CYK Williams","year":"2024","unstructured":"Williams, C. Y. K. et al. Use of a large language model to assess clinical acuity of adults in the emergency department. JAMA Netw. Open 7, e248895 (2024).","journal-title":"JAMA Netw. Open"},{"key":"1543_CR94","unstructured":"GPT-4V(ision) System Card. https:\/\/cdn.openai.com\/papers\/GPTV_System_Card.pdf (2023)."},{"key":"1543_CR95","unstructured":"Model Card Claude 3. https:\/\/www-cdn.anthropic.com\/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627\/Model_Card_Claude_3.pdf (2024)."},{"key":"1543_CR96","doi-asserted-by":"publisher","unstructured":"Gemini Team et al. Gemini 1.5: unlocking multimodal understanding across millions of tokens of context. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2403.05530 (2024).","DOI":"10.48550\/ARXIV.2403.05530"},{"key":"1543_CR97","unstructured":"GPT-4o(mni) System card. https:\/\/cdn.openai.com\/gpt-4o-system-card.pdf (2024)."},{"key":"1543_CR98","doi-asserted-by":"publisher","unstructured":"Dubey, A. et al. The Llama 3 herd of models. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2407.21783 (2024).","DOI":"10.48550\/ARXIV.2407.21783"},{"key":"1543_CR99","unstructured":"Perplexity Model Card https:\/\/docs.perplexity.ai\/guides\/model-cards. Perplexity (2024)."},{"key":"1543_CR100","doi-asserted-by":"publisher","unstructured":"\u00dcst\u00fcn, A. et al. Aya model: an instruction finetuned open-access multilingual language model. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2402.07827 (2024).","DOI":"10.48550\/ARXIV.2402.07827"},{"key":"1543_CR101","unstructured":"Model Card Claude 2. https:\/\/www-cdn.anthropic.com\/bd2a28d2535bfb0494cc8e2a3bf135d2e7523226\/Model-Card-Claude-2.pdf (2023)."},{"key":"1543_CR102","unstructured":"Model Card Claude 3.5. https:\/\/www-cdn.anthropic.com\/fed9cc193a14b84131812372d8d5857f8f304c52\/Model_Card_Claude_3_Addendum.pdf (2024)."},{"key":"1543_CR103","doi-asserted-by":"publisher","unstructured":"Toma, A. et al. Clinical Camel: an open expert-level medical language model with dialogue-based knowledge encoding. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2305.12031 (2023).","DOI":"10.48550\/ARXIV.2305.12031"},{"key":"1543_CR104","doi-asserted-by":"publisher","unstructured":"Gemini Team et al. Gemini: a family of highly capable multimodal models. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2312.11805 (2023).","DOI":"10.48550\/ARXIV.2312.11805"},{"key":"1543_CR105","doi-asserted-by":"crossref","unstructured":"Glass version 2.0. GLASS https:\/\/glass.health\/ai (2024).","DOI":"10.22233\/20412495.1224.22"},{"key":"1543_CR106","doi-asserted-by":"publisher","unstructured":"Han, T. et al. Comparative analysis of GPT-4Vision, GPT-4 and Open Source LLMs in clinical diagnostic accuracy: a benchmark against human expertise. Preprint at medRxiv https:\/\/doi.org\/10.1101\/2023.11.03.23297957 (2023).","DOI":"10.1101\/2023.11.03.23297957"},{"key":"1543_CR107","doi-asserted-by":"publisher","unstructured":"Han, T. et al. MedAlpaca\u2014an open-source collection of medical conversational AI models and training data. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2304.08247 (2023).","DOI":"10.48550\/ARXIV.2304.08247"},{"key":"1543_CR108","doi-asserted-by":"publisher","unstructured":"Chen, Z. et al. MEDITRON-70B: scaling medical pretraining for large language models. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2311.16079 (2023).","DOI":"10.48550\/ARXIV.2311.16079"},{"key":"1543_CR109","doi-asserted-by":"publisher","unstructured":"Jiang, A. Q. et al. Mistral 7B. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2310.06825 (2023).","DOI":"10.48550\/ARXIV.2310.06825"},{"key":"1543_CR110","doi-asserted-by":"publisher","unstructured":"Jiang, A. Q. et al. Mixtral of experts. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2401.04088 (2024).","DOI":"10.48550\/ARXIV.2401.04088"},{"key":"1543_CR111","unstructured":"Zhang, V. NVIDIA AI foundation models: build custom enterprise Chatbots and co-pilots with production-ready LLMs. NVIDIA Technical Blog https:\/\/developer.nvidia.com\/blog\/nvidia-ai-foundation-models-build-custom-enterprise-chatbots-and-co-pilots-with-production-ready-llms\/ (2023)."},{"key":"1543_CR112","doi-asserted-by":"publisher","unstructured":"K\u00f6pf, A. et al. OpenAssistant Conversations\u2014democratizing large language model alignment. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2304.07327 (2023).","DOI":"10.48550\/ARXIV.2304.07327"},{"key":"1543_CR113","doi-asserted-by":"publisher","unstructured":"Xu, C. et al. WizardLM: empowering large language models to follow complex instructions. Preprint at arXiv https:\/\/doi.org\/10.48550\/ARXIV.2304.12244 (2023).","DOI":"10.48550\/ARXIV.2304.12244"},{"key":"1543_CR114","doi-asserted-by":"publisher","first-page":"51","DOI":"10.7326\/M18-1376","volume":"170","author":"RF Wolff","year":"2019","unstructured":"Wolff, R. F. et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann. Intern. Med. 170, 51\u201358 (2019).","journal-title":"Ann. Intern. Med."},{"key":"1543_CR115","doi-asserted-by":"publisher","DOI":"10.1136\/bmjgh-2018-000798","volume":"3","author":"B Wahl","year":"2018","unstructured":"Wahl, B., Cossy-Gantner, A., Germann, S. & Schwalbe, N. R. Artificial intelligence (AI) and global health: how can AI contribute to health in resource-poor settings? BMJ Glob. Health 3, e000798 (2018).","journal-title":"BMJ Glob. Health"},{"key":"1543_CR116","doi-asserted-by":"publisher","first-page":"e48785","DOI":"10.2196\/48785","volume":"9","author":"C Preiksaitis","year":"2023","unstructured":"Preiksaitis, C. & Rose, C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. JMIR Med. Educ. 9, e48785 (2023).","journal-title":"JMIR Med. Educ."},{"key":"1543_CR117","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1038\/s43856-023-00370-1","volume":"3","author":"J Clusmann","year":"2023","unstructured":"Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med. 3, 141 (2023).","journal-title":"Commun. Med."},{"key":"1543_CR118","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/s11604-023-01474-3","volume":"42","author":"D Ueda","year":"2023","unstructured":"Ueda, D. et al. Fairness of artificial intelligence in healthcare: review and recommendations. Jpn. J. Radiol. 42, 3\u201315 (2023).","journal-title":"Jpn. J. Radiol."},{"key":"1543_CR119","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1001\/jama.2017.19163","volume":"319","author":"MDF McInnes","year":"2018","unstructured":"McInnes, M. D. F. et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA 319, 388\u2013396 (2018).","journal-title":"JAMA"},{"key":"1543_CR120","doi-asserted-by":"publisher","first-page":"b2535","DOI":"10.1136\/bmj.b2535","volume":"339","author":"D Moher","year":"2009","unstructured":"Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G. & PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 339, b2535 (2009).","journal-title":"BMJ"},{"key":"1543_CR121","doi-asserted-by":"publisher","first-page":"1100","DOI":"10.1007\/s11604-024-01608-1","volume":"42","author":"SL Walston","year":"2024","unstructured":"Walston, S. L. et al. Data set terminology of deep learning in medicine: a historical review and recommendation. Jpn. J. Radiol. 42, 1100\u20131109 (2024).","journal-title":"Jpn. J. Radiol."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01543-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01543-z","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01543-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,25]],"date-time":"2025-03-25T23:28:56Z","timestamp":1742945336000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01543-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,22]]},"references-count":121,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1543"],"URL":"https:\/\/doi.org\/10.1038\/s41746-025-01543-z","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.01.20.24301563","asserted-by":"object"}]},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,22]]},"assertion":[{"value":"26 July 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 February 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 March 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"175"}}