{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T21:25:40Z","timestamp":1773523540213,"version":"3.50.1"},"reference-count":26,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T00:00:00Z","timestamp":1757030400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec><jats:title>Purpose<\/jats:title><jats:p>To evaluate the performance of five popular large language models (LLMs) in addressing cataract-related queries.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>This comparative evaluation study was conducted at the Eye and ENT Hospital of Fudan University. We performed both qualitative and quantitative assessments of responses from five LLMs: ChatGPT-4, ChatGPT-4o, Gemini, Copilot, and the open-source Llama 3.5. Model outputs were benchmarked against human-generated responses using seven key metrics: accuracy, completeness, conciseness, harmlessness, readability, stability, and self-correction capability. Additional inter-model comparisons were performed across question subgroups categorized by clinical topic type.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>In the information quality assessment, ChatGPT-4o demonstrated the best performance across most metrics, including accuracy score (6.70\u202f\u00b1\u202f0.63), completeness score (4.63\u202f\u00b1\u202f0.63), and harmlessness score (3.97\u202f\u00b1\u202f0.17). Gemini achieved the highest conciseness score (4.00\u202f\u00b1\u202f0.14). Further subgroup analysis showed that all LLMs performed comparably to or better than humans, regardless of the type of question posed. The readability assessment revealed that ChatGPT-4o had the lowest readability score (26.02\u202f\u00b1\u202f10.78), indicating the highest level of reading difficulty. While Copilot recorded a higher readability score (40.26\u202f\u00b1\u202f14.58) than the other LLMs, it still remained lower than that of humans (51.54\u202f\u00b1\u202f13.71). Copilot also exhibited the best stability in reproducibility and stability assessment. All LLMs demonstrated strong self-correction capability when prompted.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>Our study suggested that LLMs exhibited considerable potential in providing accurate and comprehensive responses to common cataract-related clinical issues. Notably, ChatGPT-4o achieved the best scores in accuracy, completeness, and harmlessness. Despite these promising results, clinicians and patients should be aware of the limitations of artificial intelligence (AI) to ensure critical evaluation in clinical practice.<\/jats:p><\/jats:sec>","DOI":"10.3389\/frai.2025.1639221","type":"journal-article","created":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T10:40:22Z","timestamp":1757068822000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Transforming cataract care through artificial intelligence: an evaluation of large language models\u2019 performance in addressing cataract-related queries"],"prefix":"10.3389","volume":"8","author":[{"given":"Xinyue","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yan","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Linghao","family":"Song","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yinuo","family":"Wen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shenjie","family":"Peng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruoxi","family":"Ren","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianhui","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongxiang","family":"Jiang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2025,9,5]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","first-page":"1549","DOI":"10.1007\/s00259-023-06172-w","article-title":"Large language models (LLM) and ChatGPT: what will the impact on nuclear medicine be?","volume":"50","author":"Alberts","year":"2023","journal-title":"Eur. J. Nucl. Med. Mol. Imaging"},{"key":"ref2","doi-asserted-by":"publisher","first-page":"e35179","DOI":"10.7759\/cureus.35179","article-title":"Artificial hallucinations in ChatGPT: implications in scientific writing","volume":"15","author":"Alkaissi","year":"2023","journal-title":"Cureus"},{"key":"ref3","doi-asserted-by":"publisher","first-page":"1371","DOI":"10.1136\/bjo-2023-324438","article-title":"Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering","volume":"108","author":"Antaki","year":"2024","journal-title":"Br. J. Ophthalmol."},{"key":"ref4","doi-asserted-by":"publisher","first-page":"e2330320","DOI":"10.1001\/jamanetworkopen.2023.30320","article-title":"Comparison of ophthalmologist and large language model chatbot responses to online patient eye care questions","volume":"6","author":"Bernstein","year":"2023","journal-title":"JAMA Netw. Open"},{"key":"ref5","doi-asserted-by":"publisher","first-page":"1459","DOI":"10.1001\/jamaoncol.2023.2954","article-title":"Use of artificial intelligence chatbots for Cancer treatment information","volume":"9","author":"Chen","year":"2023","journal-title":"JAMA Oncol."},{"key":"ref6","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1038\/s43856-023-00370-1","article-title":"The future landscape of large language models in medicine","volume":"3","author":"Clusmann","year":"2023","journal-title":"Commun. Med."},{"key":"ref7","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1016\/j.ajo.2024.04.004","article-title":"Using large language models to generate educational materials on childhood Glaucoma","volume":"265","author":"Dihan","year":"2024","journal-title":"Am. J. Ophthalmol."},{"key":"ref8","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1037\/h0057532","article-title":"A new readability yardstick","volume":"32","author":"Flesch","year":"1948","journal-title":"J. Appl. Psychol."},{"key":"ref9","doi-asserted-by":"publisher","first-page":"e2336483","DOI":"10.1001\/jamanetworkopen.2023.36483","article-title":"Accuracy and reliability of chatbot responses to physician questions","volume":"6","author":"Goodman","year":"2023","journal-title":"JAMA Netw. Open"},{"key":"ref10","doi-asserted-by":"publisher","first-page":"NP1078","DOI":"10.1093\/asj\/sjad128","article-title":"Performance of ChatGPT on the plastic surgery Inservice training examination","volume":"43","author":"Gupta","year":"2023","journal-title":"Aesthet. Surg. J."},{"key":"ref11","doi-asserted-by":"publisher","first-page":"424","DOI":"10.1097\/j.jcrs.0000000000001345","article-title":"Performance of ChatGPT in cataract surgery counseling","volume":"50","author":"Gupta","year":"2024","journal-title":"J. Cataract Refract. Surg."},{"key":"ref12","doi-asserted-by":"publisher","first-page":"3395","DOI":"10.1007\/s40123-023-00789-8","article-title":"What can GPT-4 do for diagnosing rare eye diseases? A pilot study","volume":"12","author":"Hu","year":"2023","journal-title":"Ophthalmol. Ther."},{"key":"ref13","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1001\/jamaophthalmol.2023.6917","article-title":"Assessment of a large language model's responses to questions and cases about Glaucoma and retina management","volume":"142","author":"Huang","year":"2024","journal-title":"JAMA Ophthalmol."},{"key":"ref14","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1145\/3571730","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref15","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","article-title":"The measurement of observer agreement for categorical data","volume":"33","author":"Landis","year":"1977","journal-title":"Biometrics"},{"key":"ref16","doi-asserted-by":"publisher","first-page":"104770","DOI":"10.1016\/j.ebiom.2023.104770","article-title":"Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google bard","volume":"95","author":"Lim","year":"2023","journal-title":"EBioMedicine"},{"key":"ref17","doi-asserted-by":"publisher","first-page":"e40822","DOI":"10.7759\/cureus.40822","article-title":"Artificial intelligence in ophthalmology: a comparative analysis of GPT-3.5, GPT-4, and human expertise in answering StatPearls questions","volume":"15","author":"Moshirfar","year":"2023","journal-title":"Cureus"},{"key":"ref18","doi-asserted-by":"publisher","first-page":"387","DOI":"10.3109\/09286586.2015.1066016","article-title":"The effect of Counseling on cataract patient knowledge, decisional conflict, and satisfaction","volume":"22","author":"Newman-Casey","year":"2015","journal-title":"Ophthalmic Epidemiol."},{"key":"ref19","doi-asserted-by":"publisher","first-page":"108163","DOI":"10.1016\/j.isci.2023.108163","article-title":"Popular large language model chatbots' accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries","volume":"26","author":"Pushpanathan","year":"2023","journal-title":"iScience"},{"key":"ref20","doi-asserted-by":"publisher","first-page":"1979","DOI":"10.2147\/opth.S146135","article-title":"Anxiety in patients undergoing cataract surgery: a pre- and postoperative comparison","volume":"11","author":"Ramirez","year":"2017","journal-title":"Clin. Ophthalmol."},{"key":"ref21","doi-asserted-by":"publisher","first-page":"2050","DOI":"10.1038\/s41467-024-46411-8","article-title":"Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks","volume":"15","author":"Sandmann","year":"2024","journal-title":"Nat. Commun."},{"key":"ref22","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1016\/j.mcpdig.2024.01.003","article-title":"Appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model","volume":"2","author":"Tailor","year":"2024","journal-title":"Mayo Clin. Proc. Digit. Health"},{"key":"ref23","doi-asserted-by":"publisher","first-page":"1930","DOI":"10.1038\/s41591-023-02448-8","article-title":"Large language models in medicine","volume":"29","author":"Thirunavukarasu","year":"2023","journal-title":"Nat. Med."},{"key":"ref24","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1038\/d41586-023-00288-7","article-title":"ChatGPT: five priorities for research","volume":"614","author":"van Dis","year":"2023","journal-title":"Nature"},{"key":"ref25","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1001\/jamaophthalmol.2023.6937","article-title":"Large language models and the shoreline of ophthalmology","volume":"142","author":"Young","year":"2024","journal-title":"JAMA Ophthalmol."},{"key":"ref26","doi-asserted-by":"publisher","first-page":"736","DOI":"10.1186\/s12889-021-10801-0","article-title":"Online health information-seeking behaviors and skills of Chinese college students","volume":"21","author":"Zhang","year":"2021","journal-title":"BMC Public Health"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1639221\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T10:40:24Z","timestamp":1757068824000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1639221\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,5]]},"references-count":26,"alternative-id":["10.3389\/frai.2025.1639221"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1639221","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,5]]},"article-number":"1639221"}}