{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T06:29:35Z","timestamp":1781764175912,"version":"3.54.5"},"reference-count":33,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T00:00:00Z","timestamp":1740960000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:sec><jats:title>Background<\/jats:title><jats:p>Artificial intelligence (AI) has made great strides. To explore the potential of Large Language Models (LLMs) in providing medical services to patients and assisting physicians in clinical practice, our study evaluated the performance in delivering clinical questions related to autoimmune diseases.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>46 questions related to autoimmune diseases were input into ChatGPT 3.5, ChatGPT 4.0, and Gemini. The responses were then evaluated by rheumatologists based on five quality dimensions: relevance, correctness, completeness, helpfulness, and safety. Simultaneously, the responses were assessed by laboratory specialists across six medical fields: concept, clinical features, report interpretation, diagnosis, prevention and treatment, and prognosis. Finally, statistical analysis and comparisons were performed on the performance of the three chatbots in the five quality dimensions and six medical fields.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>ChatGPT 4.0 outperformed both ChatGPT 3.5 and Gemini across all five quality dimensions, with an average score of 199.8\u2009\u00b1\u200910.4, significantly higher than ChatGPT 3.5 (175.7\u2009\u00b1\u200916.6) and Gemini (179.1\u2009\u00b1\u200911.8) (<jats:italic>p<\/jats:italic>\u2009=\u20090.009 and <jats:italic>p<\/jats:italic>\u2009=\u20090.001, respectively). The average performance differences between ChatGPT 3.5 and Gemini across these five dimensions were not statistically significant. Specifically, ChatGPT 4.0 demonstrated superior performance in relevance (<jats:italic>p<\/jats:italic>\u2009&amp;lt;\u20090.0001, <jats:italic>p<\/jats:italic>\u2009&amp;lt;\u20090.0001), completeness (<jats:italic>p<\/jats:italic>\u2009&amp;lt;\u20090.0001, <jats:italic>p<\/jats:italic>\u2009=\u20090.0006), correctness (<jats:italic>p<\/jats:italic>\u2009=\u20090.0001, <jats:italic>p<\/jats:italic>\u2009=\u20090.0002), helpfulness (<jats:italic>p<\/jats:italic>\u2009&amp;lt;\u20090.0001, <jats:italic>p<\/jats:italic>\u2009&amp;lt;\u20090.0001), and safety (<jats:italic>p<\/jats:italic>\u2009&amp;lt;\u20090.0001, <jats:italic>p<\/jats:italic>\u2009=\u20090.0025) compared to both ChatGPT 3.5 and Gemini. Furthermore, ChatGPT 4.0 scored significantly higher than both ChatGPT 3.5 and Gemini in medical fields such as report interpretation (<jats:italic>p<\/jats:italic>\u2009&amp;lt;\u20090.0001, <jats:italic>p<\/jats:italic>\u2009=\u20090.0025), prevention and treatment (<jats:italic>p<\/jats:italic>\u2009&amp;lt;\u20090.0001, <jats:italic>p<\/jats:italic>\u2009=\u20090.0103), prognosis (<jats:italic>p<\/jats:italic>\u2009=\u20090.0458, <jats:italic>p<\/jats:italic>\u2009=\u20090.0458).<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>This study demonstrates that ChatGPT 4.0 significantly outperforms ChatGPT 3.5 and Gemini in addressing clinical questions related to autoimmune diseases, showing notable advantages across all five quality dimensions and six clinical domains. These findings further highlight the potential of large language models in enhancing healthcare services.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fdgth.2025.1530442","type":"journal-article","created":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T17:21:46Z","timestamp":1741022506000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["A comparative analysis of large language models on clinical questions for autoimmune diseases"],"prefix":"10.3389","volume":"7","author":[{"given":"Jing","family":"Chen","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Juntao","family":"Ma","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jie","family":"Yu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Weiming","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yijia","family":"Zhu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiawei","family":"Feng","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Linyu","family":"Geng","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xianchi","family":"Dong","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Huayong","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuxin","family":"Chen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mingzhe","family":"Ning","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2025,3,3]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1126\/science.aaa8685","article-title":"Advances in natural language processing","volume":"349","author":"Hirschberg","year":"2015","journal-title":"Science"},{"key":"B2","doi-asserted-by":"publisher","first-page":"1166120","DOI":"10.3389\/fpubh.2023.1166120","article-title":"ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health","volume":"11","author":"De Angelis","year":"2023","journal-title":"Front Public Health"},{"key":"B3","doi-asserted-by":"publisher","first-page":"1233","DOI":"10.1056\/NEJMsr2214184","article-title":"Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine","volume":"388","author":"Lee","year":"2023","journal-title":"N Engl J Med"},{"key":"B4","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1038\/d41586-023-00816-5","article-title":"GPT-4 is here: what scientists think","volume":"615","author":"Sanderson","year":"2023","journal-title":"Nature"},{"key":"B5","doi-asserted-by":"publisher","first-page":"1078","DOI":"10.1016\/j.jaad.2024.01.037","article-title":"Assessing the accuracy, usefulness, and readability of artificial-intelligence-generated responses to common dermatologic surgery questions for patient education: a double-blinded comparative study of ChatGPT and Google bard","volume":"90","author":"Robinson","year":"2024","journal-title":"J Am Acad Dermatol"},{"key":"B6","doi-asserted-by":"publisher","first-page":"916","DOI":"10.1016\/j.ccell.2021.04.002","article-title":"Artificial intelligence for clinical oncology","volume":"39","author":"Kann","year":"2021","journal-title":"Cancer Cell"},{"key":"B7","doi-asserted-by":"publisher","first-page":"1216","DOI":"10.1016\/j.jhep.2023.01.006","article-title":"Artificial intelligence, machine learning, and deep learning in liver transplantation","volume":"78","author":"Bhat","year":"2023","journal-title":"J Hepatol"},{"key":"B8","doi-asserted-by":"publisher","first-page":"e486","DOI":"10.1016\/s2589-7500(20)30160-6","article-title":"Artificial intelligence in medical imaging: switching from radiographic pathological data to clinically meaningful endpoints","volume":"2","author":"Oren","year":"2020","journal-title":"Lancet Digit Health"},{"key":"B9","doi-asserted-by":"publisher","first-page":"e59954","DOI":"10.7759\/cureus.59954","article-title":"Unveiling the influence of AI predictive analytics on patient outcomes: a comprehensive narrative review","volume":"16","author":"Dixon","year":"2024","journal-title":"Cureus"},{"key":"B10","doi-asserted-by":"publisher","first-page":"689","DOI":"10.1186\/s12909-023-04698-z","article-title":"Revolutionizing healthcare: the role of artificial intelligence in clinical practice","volume":"23","author":"Alowais","year":"2023","journal-title":"BMC Med Educ"},{"key":"B11","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1186\/s12967-024-05067-0","article-title":"Tribulations and future opportunities for artificial intelligence in precision medicine","volume":"22","author":"Carini","year":"2024","journal-title":"J Transl Med"},{"key":"B12","doi-asserted-by":"publisher","first-page":"340","DOI":"10.1056\/nejm200108023450506","article-title":"Autoimmune diseases","volume":"345","author":"Davidson","year":"2001","journal-title":"N Engl J Med"},{"key":"B13","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1159\/000478012","article-title":"Autoimmunity in the elderly: insights from basic science and clinics\u2014a mini-review","volume":"63","author":"Watad","year":"2017","journal-title":"Gerontology"},{"key":"B14","doi-asserted-by":"publisher","first-page":"729","DOI":"10.1016\/s0749-0704(02)00025-8","article-title":"Rheumatologic diseases in the intensive care unit: epidemiology, clinical approach, management, and outcome","volume":"18","author":"Janssen","year":"2002","journal-title":"Crit Care Clin"},{"key":"B15","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1016\/j.chest.2020.03.050","article-title":"One-Year outcome of critically ill patients with systemic rheumatic disease: a multicenter cohort study","volume":"158","author":"Larcher","year":"2020","journal-title":"Chest"},{"key":"B16","doi-asserted-by":"publisher","first-page":"927","DOI":"10.1378\/chest.14-3098","article-title":"Outcomes in critically ill patients with systemic rheumatic disease: a multicenter study","volume":"148","author":"Dumas","year":"2015","journal-title":"Chest"},{"key":"B17","doi-asserted-by":"publisher","first-page":"3256","DOI":"10.1093\/rheumatology\/kead291","article-title":"AI Am a rheumatologist: a practical primer to large language models for rheumatologists","volume":"62","author":"Venerito","year":"2023","journal-title":"Rheumatology (Oxford)"},{"key":"B18","doi-asserted-by":"publisher","first-page":"103698","DOI":"10.1016\/j.autrev.2024.103698","article-title":"Artificial intelligence meets the world experts; updates and novel therapies in autoimmunity\u2014the 14th international congress on autoimmunity 2024 (AUTO14), Ljubljana","volume":"24","author":"Mahroum","year":"2025","journal-title":"Autoimmun Rev"},{"key":"B19","first-page":"42","volume-title":"Discrimination, Artificial Intelligence, and Algorithmic Decision-Making","author":"Zuiderveen Borgesius","year":"2018"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1057\/s41599-022-01483-z","article-title":"Ethics and discrimination in artificial intelligence-enabled recruitment practices","volume":"10","author":"Chen","year":"2023","journal-title":"Humanit Soc Sci Commun"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1158","DOI":"10.1515\/cclm-2023-0355","article-title":"Potentials and pitfalls of ChatGPT and natural-language artificial intelligence models for the understanding of laboratory medicine test results. An assessment by the European federation of clinical chemistry and laboratory medicine (EFLM) working group on artificial intelligence (WG-AI)","volume":"61","author":"Cadamuro","year":"2023","journal-title":"Clin Chem Lab Med"},{"key":"B22","doi-asserted-by":"publisher","first-page":"1362","DOI":"10.1515\/cclm-2023-1058","article-title":"Comparison of three chatbots as an assistant for problem-solving in clinical laboratory","volume":"62","author":"Abusoglu","year":"2024","journal-title":"Clin Chem Lab Med"},{"key":"B23","doi-asserted-by":"publisher","first-page":"e0288453","DOI":"10.1371\/journal.pone.0288453","article-title":"Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis","volume":"18","author":"Zaitsu","year":"2023","journal-title":"PLoS One"},{"key":"B24","doi-asserted-by":"publisher","first-page":"104884","DOI":"10.1016\/j.idnow.2024.104884","article-title":"Evaluating ChatGPT ability to answer urinary tract infection-related questions","volume":"54","author":"Cakir","year":"2024","journal-title":"Infect Dis now"},{"key":"B25","doi-asserted-by":"publisher","first-page":"e13500","DOI":"10.1111\/srt.13500","article-title":"Assess the precision of ChatGPT\u2019s responses regarding systemic lupus erythematosus (SLE) inquiries","volume":"29","author":"Huang","year":"2023","journal-title":"Skin Res Technol"},{"key":"B26","doi-asserted-by":"publisher","first-page":"e47754","DOI":"10.7759\/cureus.47754","article-title":"ChatGPT\u2019s epoch in rheumatological diagnostics: a critical assessment in the context of Sj\u00f6gren\u2019s syndrome","volume":"15","author":"Irfan","year":"2023","journal-title":"Cureus"},{"key":"B27","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1007\/s00296-023-05473-5","article-title":"Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use","volume":"44","author":"Coskun","year":"2024","journal-title":"Rheumatol Int"},{"key":"B28","doi-asserted-by":"publisher","first-page":"108352","DOI":"10.1016\/j.chb.2024.108352","article-title":"Trust and reliance on AI\u2014an experimental study on the extent and costs of overreliance on AI","volume":"160","author":"Klingbeil","year":"2024","journal-title":"Comput Human Behav"},{"key":"B29","doi-asserted-by":"publisher","first-page":"337","DOI":"10.3390\/bioengineering11040337","article-title":"The role of AI in hospitals and clinics: transforming healthcare in the 21st century","volume":"11","author":"Maleki Varnosfaderani","year":"2024","journal-title":"Bioengineering (Basel)"},{"key":"B30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s43055-024-01356-2","article-title":"Explainability, transparency and black box challenges of AI in radiology: impact on patient care in cardiovascular radiology","volume":"55","author":"Marey","year":"2024","journal-title":"Egypt J Radiol Nucl Med"},{"key":"B31","doi-asserted-by":"publisher","first-page":"3","DOI":"10.3390\/sci6010003","article-title":"Fairness and bias in artificial intelligence: a brief survey of sources, impacts, and mitigation strategies","volume":"6","author":"Ferrara","year":"2024","journal-title":"Sci"},{"key":"B32","doi-asserted-by":"publisher","first-page":"409","DOI":"10.1093\/rheumatology\/keae152","article-title":"British society for rheumatology guideline on management of adult and juvenile onset Sj\u00f6gren disease","volume":"64","author":"Price","year":"2025","journal-title":"Rheumatology (Oxford)"},{"key":"B33","doi-asserted-by":"publisher","first-page":"2088","DOI":"10.1002\/art.42646","article-title":"2022 American college of rheumatology guideline for the prevention and treatment of glucocorticoid-induced osteoporosis","volume":"75","author":"Humphrey","year":"2023","journal-title":"Arthritis Rheumatol"}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1530442\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T17:21:49Z","timestamp":1741022509000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1530442\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,3]]},"references-count":33,"alternative-id":["10.3389\/fdgth.2025.1530442"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2025.1530442","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,3]]},"article-number":"1530442"}}