{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T01:20:36Z","timestamp":1773969636455,"version":"3.50.1"},"reference-count":61,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T00:00:00Z","timestamp":1764201600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T00:00:00Z","timestamp":1764201600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Wellcome Trust Early-Career","award":["318987\/Z\/24\/Z"],"award-info":[{"award-number":["318987\/Z\/24\/Z"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Vision-language models (VLMs) show promise for answering clinically relevant questions, but their robustness to medical image artefacts remains unclear. We evaluated VLMs\u2019 robustness through their performance on images with and without weak artefacts across five artefact categories, as well as their ability to detect images with strong artefacts. We built evaluation benchmarks using brain MRI scans, Chest X-ray, and retinal images, involving four real-world medical datasets. VLMs achieved moderate accuracy on original unaltered images (0.645, 0.602 and 0.604 for MRI, OCT, and X-ray applications, respectively). Accuracy declined with weak artefacts (\u22123.34%, \u22129.06% and \u221210.46%), while strong artefacts were detected at low rates (0.194, 0.128 and 0.115). Our findings indicated that VLMs are not yet capable of performing tasks on medical images with artefacts, underscoring the need to establish uniform benchmark thoroughly examining model robustness to image artefacts, and explicitly incorporate artefact-aware method design and robustness tests into VLM development.<\/jats:p>","DOI":"10.1038\/s41746-025-02108-w","type":"journal-article","created":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T14:01:50Z","timestamp":1764252110000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Understanding the robustness of vision-language models to medical image artefacts"],"prefix":"10.1038","volume":"8","author":[{"given":"Zijie","family":"Cheng","sequence":"first","affiliation":[]},{"given":"Ariel Yuhan","family":"Ong","sequence":"additional","affiliation":[]},{"given":"Siegfried K.","family":"Wagner","sequence":"additional","affiliation":[]},{"given":"David A.","family":"Merle","sequence":"additional","affiliation":[]},{"given":"Lie","family":"Ju","sequence":"additional","affiliation":[]},{"given":"Hanyuan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Ruinian","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Linze","family":"Pang","sequence":"additional","affiliation":[]},{"given":"Boxuan","family":"Li","sequence":"additional","affiliation":[]},{"given":"Tiantian","family":"He","sequence":"additional","affiliation":[]},{"given":"Anran","family":"Ran","sequence":"additional","affiliation":[]},{"given":"Hongyang","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Dawei Gabriel","family":"YANG","sequence":"additional","affiliation":[]},{"given":"Ke","family":"Zou","sequence":"additional","affiliation":[]},{"given":"Jocelyn Hui Lin","family":"Goh","sequence":"additional","affiliation":[]},{"given":"Sahana","family":"Srinivasan","sequence":"additional","affiliation":[]},{"given":"Andre","family":"Altmann","sequence":"additional","affiliation":[]},{"given":"Daniel C.","family":"Alexander","sequence":"additional","affiliation":[]},{"given":"Carol Y.","family":"Cheung","sequence":"additional","affiliation":[]},{"given":"Yih Chung","family":"Tham","sequence":"additional","affiliation":[]},{"given":"Pearse A.","family":"Keane","sequence":"additional","affiliation":[]},{"given":"Yukun","family":"Zhou","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,11,27]]},"reference":[{"key":"2108_CR1","doi-asserted-by":"publisher","unstructured":"Bordes, F. et al. An introduction to vision-language modeling. arXiv [cs.LG] https:\/\/doi.org\/10.48550\/arXiv.2405.17247 (2024).","DOI":"10.48550\/arXiv.2405.17247"},{"key":"2108_CR2","doi-asserted-by":"publisher","unstructured":"OpenAI et al. GPT-4 Technical Report. arXiv [cs.CL] https:\/\/doi.org\/10.48550\/arXiv.2303.08774 (2023).","DOI":"10.48550\/arXiv.2303.08774"},{"key":"2108_CR3","unstructured":"Antropic et al. The Claude 3 Model Family: Opus, Sonnet, Haiku. https:\/\/www.anthropic.com (2024)."},{"key":"2108_CR4","unstructured":"Yang, X., Wu, Y., Yang, M., Chen, H. & Geng, X. Exploring diverse in-context configurations for image captioning. Adv. Neural Inf. Process. Syst. 36, 40924\u201340943 (2023)."},{"key":"2108_CR5","unstructured":"He, S. et al. Meddr: Diagnosis-guided bootstrapping for large-scale medical vision-language learning. CoRR http:\/\/arxiv.org\/abs\/2404.15127 (2024)."},{"key":"2108_CR6","doi-asserted-by":"crossref","unstructured":"Li, C. et al. LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. Adv. Neural Inf. Process. Syst. 36, 28541\u201328564 (2023).","DOI":"10.32388\/VLXB6M"},{"key":"2108_CR7","doi-asserted-by":"publisher","unstructured":"Zhang, S. et al. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. arXiv [cs.CV] https:\/\/doi.org\/10.48550\/arXiv.2303.00915 (2023).","DOI":"10.48550\/arXiv.2303.00915"},{"key":"2108_CR8","doi-asserted-by":"crossref","unstructured":"Murray, R. & Werpy, N. Image interpretation and artefacts. in Equine MRI 101\u2013145 (Wiley Online Library, 2010).","DOI":"10.1002\/9781118786574.ch4"},{"key":"2108_CR9","doi-asserted-by":"crossref","unstructured":"Goceri, E. Medical image data augmentation: techniques, comparisons and interpretations. Artif. Intell. Rev. 56, 12561\u201312605 (2023).","DOI":"10.1007\/s10462-023-10453-z"},{"key":"2108_CR10","doi-asserted-by":"crossref","unstructured":"Hindi, A., Peterson, C. & Barr, R. G. Artifacts in diagnostic ultrasound. Rep. Med. Imaging 6, 29\u201348 (2013).","DOI":"10.2147\/RMI.S33464"},{"key":"2108_CR11","doi-asserted-by":"publisher","first-page":"6384","DOI":"10.1007\/s00330-021-07709-z","volume":"31","author":"H Arabi","year":"2021","unstructured":"Arabi, H. & Zaidi, H. Deep learning-based metal artefact reduction in PET\/CT imaging. Eur. Radiol. 31, 6384\u20136396 (2021).","journal-title":"Eur. Radiol."},{"key":"2108_CR12","doi-asserted-by":"publisher","first-page":"106391","DOI":"10.1016\/j.compbiomed.2022.106391","volume":"152","author":"F Garcea","year":"2023","unstructured":"Garcea, F., Serra, A., Lamberti, F. & Morra, L. Data augmentation for medical imaging: a systematic literature review. Comput. Biol. Med. 152, 106391 (2023).","journal-title":"Comput. Biol. Med."},{"key":"2108_CR13","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1007\/s10334-005-0101-0","volume":"18","author":"G Eggers","year":"2005","unstructured":"Eggers, G. et al. Artefacts in magnetic resonance imaging caused by dental material. MAGMA 18, 103\u2013111 (2005).","journal-title":"MAGMA"},{"key":"2108_CR14","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1016\/j.ajo.2004.07.050","volume":"139","author":"R Ray","year":"2005","unstructured":"Ray, R., Stinnett, S. S. & Jaffe, G. J. Evaluation of image artifact produced by optical coherence tomography of retinal pathology. Am. J. Ophthalmol. 139, 18\u201329 (2005).","journal-title":"Am. J. Ophthalmol."},{"key":"2108_CR15","doi-asserted-by":"publisher","first-page":"229","DOI":"10.2217\/iim.12.13","volume":"4","author":"FE Boas","year":"2012","unstructured":"Boas, F. E. & Fleischmann, D. CT artifacts: causes and reduction techniques. Imaging Med. 4, 229\u2013240 (2012).","journal-title":"Imaging Med."},{"key":"2108_CR16","doi-asserted-by":"publisher","first-page":"123","DOI":"10.4103\/JOCO.JOCO_83_20","volume":"32","author":"F Bazvand","year":"2020","unstructured":"Bazvand, F. & Ghassemi, F. Artifacts in macular optical coherence tomography. J. Curr. Ophthalmol. 32, 123\u2013131 (2020).","journal-title":"J. Curr. Ophthalmol."},{"key":"2108_CR17","doi-asserted-by":"crossref","unstructured":"Spaide, R. F., Fujimoto, J. G. & Waheed, N. K. Image artifacts in optical coherence tomography angiography. Retina 35, 2163\u20132180 (2015)","DOI":"10.1097\/IAE.0000000000000765"},{"key":"2108_CR18","doi-asserted-by":"publisher","first-page":"679","DOI":"10.1007\/s13244-010-0062-3","volume":"2","author":"NM Long","year":"2011","unstructured":"Long, N. M. & Smith, C. S. Causes and imaging features of false positives and false negatives on F-PET\/CT in oncologic imaging. Insights Imaging 2, 679\u2013698 (2011).","journal-title":"Insights Imaging"},{"key":"2108_CR19","doi-asserted-by":"publisher","first-page":"815","DOI":"10.1097\/JTO.0b013e31824abd9c","volume":"7","author":"BD Gelbman","year":"2012","unstructured":"Gelbman, B. D. et al. Radiographic and clinical characterization of false negative results from CT-guided needle biopsies of lung nodules. J. Thorac. Oncol. 7, 815\u2013820 (2012).","journal-title":"J. Thorac. Oncol."},{"key":"2108_CR20","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1016\/S0001-2998(96)80005-5","volume":"26","author":"DM Howarth","year":"1996","unstructured":"Howarth, D. M., Forstrom, L. A., O\u2019Connor, M. K., Thomas, P. A. & Cardew, A. P. Patient-related pitfalls and artifacts in nuclear medicine imaging. Semin. Nucl. Med. 26, 295\u2013307 (1996).","journal-title":"Semin. Nucl. Med."},{"key":"2108_CR21","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1259\/bjr\/47213729","volume":"85","author":"I Millet","year":"2012","unstructured":"Millet, I. et al. Pearls and pitfalls in breast MRI. Br. J. Radiol. 85, 197\u2013207 (2012).","journal-title":"Br. J. Radiol."},{"key":"2108_CR22","doi-asserted-by":"crossref","unstructured":"Ali, S. et al. An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy. Sci. Rep. 10, 2748 (2020).","DOI":"10.1038\/s41598-020-59413-5"},{"key":"2108_CR23","doi-asserted-by":"publisher","first-page":"107251","DOI":"10.1016\/j.ultras.2024.107251","volume":"140","author":"L Howell","year":"2024","unstructured":"Howell, L., Ingram, N., Lapham, R., Morrell, A. & McLaughlan, J. R. Deep learning for real-time multi-class segmentation of artefacts in lung ultrasound. Ultrasonics 140, 107251 (2024).","journal-title":"Ultrasonics"},{"key":"2108_CR24","first-page":"10","volume":"1","author":"S Elyounssi","year":"2025","unstructured":"Elyounssi, S. et al. Addressing artifactual bias in large, automated MRI analyses of brain development. Nat. Neurosci. 1, 10 (2025).","journal-title":"Nat. Neurosci."},{"key":"2108_CR25","doi-asserted-by":"crossref","unstructured":"Ye, J. et al. Gmai-MMBench: A comprehensive multimodal evaluation benchmark towards general medical AI. Adv. Neural Inf. Process. Syst. 37, 94327\u201394427 (2024).","DOI":"10.52202\/079017-2992"},{"key":"2108_CR26","doi-asserted-by":"crossref","unstructured":"Gu, Z., Chen, J., Liu, F., Yin, C. & Zhang, P. MedVH: Toward systematic evaluation of hallucination for large vision language models in the medical context. Adv. Intell. Syst. 7, 2500255 (2025).","DOI":"10.1002\/aisy.202500255"},{"key":"2108_CR27","doi-asserted-by":"crossref","unstructured":"Xia, P. et al. CARES: A comprehensive benchmark of trustworthiness in medical vision language models. Adv. Neural Inf. Process. Syst. 37, 140334\u2013140365 (2024).","DOI":"10.52202\/079017-4455"},{"key":"2108_CR28","unstructured":"Royer, C., Menze, B. & Sekuboyina, A. Multimedeval: A benchmark and a toolkit for evaluating medical vision-language models. MIDL arXiv:2402.09262 (2024)."},{"key":"2108_CR29","doi-asserted-by":"publisher","unstructured":"Jiang, Y. et al. Evaluating general vision-language models for clinical medicine. medRxiv https:\/\/doi.org\/10.1101\/2024.04.12.24305744 (2024).","DOI":"10.1101\/2024.04.12.24305744"},{"key":"2108_CR30","doi-asserted-by":"crossref","unstructured":"Yildirim, N. et al. Multimodal healthcare AI: identifying and designing clinically relevant vision-language applications for radiology. CHI Conf. Hum. Factors Comput. Syst. 1\u201322 (2024).","DOI":"10.1145\/3613904.3642013"},{"key":"2108_CR31","doi-asserted-by":"publisher","first-page":"873","DOI":"10.1111\/epi.17907","volume":"65","author":"E van Diessen","year":"2024","unstructured":"van Diessen, E., van Amerongen, R. A., Zijlmans, M. & Otte, W. M. Potential merits and flaws of large language models in epilepsy care: a critical review. Epilepsia 65, 873\u2013886 (2024).","journal-title":"Epilepsia"},{"key":"2108_CR32","doi-asserted-by":"publisher","first-page":"102412","DOI":"10.1016\/j.inffus.2024.102412","volume":"108","author":"E Nasarian","year":"2024","unstructured":"Nasarian, E., Alizadehsani, R., Acharya, U. R. & Tsui, K.-L. Designing interpretable ML system to enhance trust in healthcare: a systematic review to proposed responsible clinician-AI-collaboration framework. Inf. Fusion 108, 102412 (2024).","journal-title":"Inf. Fusion"},{"key":"2108_CR33","doi-asserted-by":"crossref","unstructured":"Tanno, R. et al. Collaboration between clinicians and vision-language models in radiology report generation. Nat. Med. 31, 599\u2013608 (2025).","DOI":"10.1038\/s41591-024-03302-1"},{"key":"2108_CR34","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-024-80917-x","volume":"14","author":"J Zhang","year":"2024","unstructured":"Zhang, J. et al. A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis. Sci. Rep. 14, 30385 (2024).","journal-title":"Sci. Rep."},{"key":"2108_CR35","doi-asserted-by":"publisher","unstructured":"Grattafiori, A. et al. The Llama 3 herd of models. arXiv [cs.AI] https:\/\/doi.org\/10.48550\/arXiv.2407.21783 (2024).","DOI":"10.48550\/arXiv.2407.21783"},{"key":"2108_CR36","doi-asserted-by":"publisher","unstructured":"Sellergren, A. et al. MedGemma Technical Report. arXiv [cs.AI] https:\/\/doi.org\/10.48550\/arXiv.2507.05201 (2025).","DOI":"10.48550\/arXiv.2507.05201"},{"key":"2108_CR37","doi-asserted-by":"crossref","unstructured":"VVan, M.\u2013H., Verma, P. & Wu, X. On large visual language models for medical imaging analysis: An empirical study. In 2024 IEEE\/ACM Conf. Connected Health: Appl. Syst. Eng. Technol. 172\u2013176 (2024).","DOI":"10.1109\/CHASE60773.2024.00029"},{"key":"2108_CR38","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1038\/s41586-023-06555-x","volume":"622","author":"Y Zhou","year":"2023","unstructured":"Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156\u2013163 (2023).","journal-title":"Nature"},{"key":"2108_CR39","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1038\/s41746-025-01533-1","volume":"8","author":"Q Xie","year":"2025","unstructured":"Xie, Q. et al. Medical foundation large language models for comprehensive text analysis and beyond. NPJ Digit. Med. 8, 141 (2025).","journal-title":"NPJ Digit. Med."},{"key":"2108_CR40","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1102\/1470-7330.2007.0014","volume":"7","author":"JA van Dalen","year":"2007","unstructured":"van Dalen, J. A., Vogel, W. V., Corstens, F. H. M. & Oyen, W. J. G. Multi-modality nuclear medicine imaging: artefacts, pitfalls and recommendations. Cancer Imaging 7, 77\u201383 (2007).","journal-title":"Cancer Imaging"},{"key":"2108_CR41","doi-asserted-by":"publisher","first-page":"93","DOI":"10.12659\/PJR.892628","volume":"80","author":"K Krupa","year":"2015","unstructured":"Krupa, K. & Bekiesi\u0144ska-Figatowska, M. Artifacts in magnetic resonance imaging. Pol. J. Radiol. 80, 93\u2013106 (2015).","journal-title":"Pol. J. Radiol."},{"key":"2108_CR42","doi-asserted-by":"crossref","unstructured":"Alderman, J. E. et al. Tackling algorithmic bias and promoting transparency in health datasets: The STANDING Together consensus recommendations. Lancet Digit. Health 7, e64\u2013e88 (2025).","DOI":"10.1016\/S2589-7500(24)00224-3"},{"key":"2108_CR43","doi-asserted-by":"crossref","unstructured":"Transparency (in training data) is what we want. Nat. Mach. Intell. 7, 329\u2013329 (2025).","DOI":"10.1038\/s42256-025-01023-9"},{"key":"2108_CR44","unstructured":"Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824\u201324837 (2022)."},{"key":"2108_CR45","unstructured":"Feng, G. et al. Towards revealing the mystery behind chain of thought: a theoretical perspective. Adv. Neural Inf. Process. Syst. 36, 70757\u201370798 (2023)."},{"key":"2108_CR46","doi-asserted-by":"crossref","unstructured":"Zhang, F., Ling, Y., Yang, J., Zhang, P. & Zhang, G. COVID-19 knowledge mining based on large language models and chain-of-thought reasoning. BIBM. 6574\u20136581 (2024).","DOI":"10.1109\/BIBM62325.2024.10822266"},{"key":"2108_CR47","doi-asserted-by":"publisher","first-page":"1929","DOI":"10.1093\/jamia\/ocae095","volume":"31","author":"M Li","year":"2024","unstructured":"Li, M., Zhou, H., Yang, H. & Zhang, R. RT: a Retrieving and Chain-of-Thought framework for few-shot medical named entity recognition. J. Am. Med. Inform. Assoc. 31, 1929\u20131938 (2024).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2108_CR48","unstructured":"Zhang, Z., Zhang, A., Li, M. & Smola, A. Automatic chain of thought prompting in large language models. ICLR arXiv:2210.03493 (2022)."},{"key":"2108_CR49","doi-asserted-by":"crossref","unstructured":"Jeong, D. P., Garg, S., Lipton, Z. C. & Oberst, M. Medical adaptation of large language and vision-language models: Are we making progress? EMNLP arXiv:2411.04118 (2024).","DOI":"10.18653\/v1\/2024.emnlp-main.677"},{"key":"2108_CR50","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-024-51465-9","volume":"15","author":"D Ferber","year":"2024","unstructured":"Ferber, D. et al. In-context learning enables multimodal large language models to classify cancer pathology images. Nat. Commun. 15, 10104 (2024).","journal-title":"Nat. Commun."},{"key":"2108_CR51","doi-asserted-by":"publisher","unstructured":"Nickparvar, M. Brain Tumor MRI Dataset. Kaggle. https:\/\/doi.org\/10.34740\/kaggle\/dsv\/2645886 (2021).","DOI":"10.34740\/kaggle\/dsv\/2645886"},{"key":"2108_CR52","doi-asserted-by":"publisher","unstructured":"Cheng, J. Brain tumor dataset. figshare. https:\/\/doi.org\/10.6084\/M9.FIGSHARE.1512427.V1 (2015).","DOI":"10.6084\/M9.FIGSHARE.1512427.V1"},{"key":"2108_CR53","doi-asserted-by":"publisher","unstructured":"Bhuvaji, S., Kadam, A., Bhumkar, P., Dedge, S. & Kanchan, S. Brain Tumor Classification (MRI). Kaggle. https:\/\/doi.org\/10.34740\/kaggle\/dsv\/12745533 (2020).","DOI":"10.34740\/kaggle\/dsv\/12745533"},{"key":"2108_CR54","unstructured":"Hamada, A. H. Br35H: Brain Tumor Detection 2020. Kaggle. https:\/\/www.kaggle.com\/datasets\/ahmedhamada0\/brain-tumor-detection (2021)."},{"key":"2108_CR55","doi-asserted-by":"publisher","unstructured":"Kermany, D. Labeled optical coherence tomography (OCT) and Chest X-Ray images for classification. Mendeley. https:\/\/doi.org\/10.17632\/rscbjbr9sj.2 (2018).","DOI":"10.17632\/rscbjbr9sj.2"},{"key":"2108_CR56","unstructured":"Deshpande, D. COVID-19 Detection X-Ray Dataset. Kaggle. https:\/\/www.kaggle.com\/datasets\/darshan1504\/covid19-detection-xray-dataset (2020)."},{"key":"2108_CR57","doi-asserted-by":"publisher","first-page":"1122","DOI":"10.1016\/j.cell.2018.02.010","volume":"172","author":"DS Kermany","year":"2018","unstructured":"Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122\u20131131.e9 (2018).","journal-title":"Cell"},{"key":"2108_CR58","doi-asserted-by":"crossref","unstructured":"P\u00e9rez-Garc\u00eda, F., Sparks, R. & Ourselin, S. TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Comput. Methods Programs Biomed. 208, 106236 (2021).","DOI":"10.1016\/j.cmpb.2021.106236"},{"key":"2108_CR59","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1016\/j.ins.2019.06.011","volume":"501","author":"T Li","year":"2019","unstructured":"Li, T. et al. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf. Sci. 501, 511\u2013522 (2019).","journal-title":"Inf. Sci."},{"key":"2108_CR60","doi-asserted-by":"publisher","unstructured":"OpenAI et al. GPT-4o System Card. arXiv [cs.CL] https:\/\/doi.org\/10.48550\/arXiv.2410.21276 (2024).","DOI":"10.48550\/arXiv.2410.21276"},{"key":"2108_CR61","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-025-02601-y","volume":"15","author":"J Ma","year":"2025","unstructured":"Ma, J. et al. Large language model evaluation in autoimmune disease clinical questions comparing ChatGPT 4o, Claude 3.5 Sonnet and Gemini 1.5 pro. Sci. Rep. 15, 17635 (2025).","journal-title":"Sci. Rep."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-02108-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-02108-w","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-02108-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T14:02:03Z","timestamp":1764252123000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-02108-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,27]]},"references-count":61,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["2108"],"URL":"https:\/\/doi.org\/10.1038\/s41746-025-02108-w","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,27]]},"assertion":[{"value":"12 July 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 October 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 November 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"727"}}