{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,10]],"date-time":"2026-07-10T23:52:41Z","timestamp":1783727561229,"version":"3.55.0"},"reference-count":47,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T00:00:00Z","timestamp":1750982400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>Artificial Intelligence (AI) chatbots, which generate human-like responses based on extensive data, are becoming important tools in healthcare by providing information on health conditions, treatments, and preventive measures, acting as virtual assistants. However, their performance in aligning with clinical practice guidelines (CPGs) for providing answers to complex clinical questions on lumbosacral radicular pain is still unclear. We aim to evaluate AI chatbots' performance against CPG recommendations for diagnosing and treating lumbosacral radicular pain.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>We performed a cross-sectional study to assess AI chatbots' responses against CPGs recommendations for diagnosing and treating lumbosacral radicular pain. Clinical questions based on these CPGs were posed to the latest versions (updated in 2024) of six AI chatbots: ChatGPT-3.5, ChatGPT-4o, Microsoft Copilot, Google Gemini, Claude, and Perplexity. The chatbots' responses were evaluated for (a) consistency of text responses using Plagiarism Checker X, (b) intra- and inter-rater reliability using Fleiss' Kappa, and (c) match rate with CPGs. Statistical analyses were performed with STATA\/MP 16.1.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We found high variability in the text consistency of AI chatbot responses (median range 26%\u201368%). Intra-rater reliability ranged from \u201calmost perfect\u201d to \u201csubstantial,\u201d while inter-rater reliability varied from \u201calmost perfect\u201d to \u201cmoderate.\u201d Perplexity had the highest match rate at 67%, followed by Google Gemini at 63%, and Microsoft Copilot at 44%. ChatGPT-3.5, ChatGPT-4o, and Claude showed the lowest performance, each with a 33% match rate.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Despite the variability in internal consistency and good intra- and inter-rater reliability, the AI Chatbots' recommendations often did not align with CPGs recommendations for diagnosing and treating lumbosacral radicular pain. Clinicians and patients should exercise caution when relying on these AI models, since one to two-thirds of the recommendations provided may be inappropriate or misleading according to specific chatbots.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fdgth.2025.1574287","type":"journal-article","created":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T05:33:15Z","timestamp":1751002395000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study"],"prefix":"10.3389","volume":"7","author":[{"given":"Giacomo","family":"Rossettini","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Silvia","family":"Bargeri","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chad","family":"Cook","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stefania","family":"Guida","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alvisa","family":"Palese","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lia","family":"Rodeghiero","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Paolo","family":"Pillastrini","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andrea","family":"Turolla","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Greta","family":"Castellini","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Silvia","family":"Gianola","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2025,6,27]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"1930","DOI":"10.1038\/s41591-023-02448-8","article-title":"Large language models in medicine","volume":"29","author":"Thirunavukarasu","year":"2023","journal-title":"Nat Med"},{"key":"B2","doi-asserted-by":"publisher","first-page":"e94","DOI":"10.1016\/S2589-7500(24)00202-4","article-title":"Attitudes and perceptions of medical researchers towards the use of artificial intelligence chatbots in the scientific process: an international cross-sectional survey","volume":"7","author":"Ng","year":"2025","journal-title":"Lancet Digit Health"},{"key":"B3","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1038\/s43856-023-00370-1","article-title":"The future landscape of large language models in medicine","volume":"3","author":"Clusmann","year":"2023","journal-title":"Commun Med"},{"key":"B4","doi-asserted-by":"publisher","first-page":"728","DOI":"10.2519\/jospt.2023.12000","article-title":"Pros and cons of using artificial intelligence chatbots for musculoskeletal rehabilitation management","volume":"53","author":"Rossettini","year":"2023","journal-title":"J Orthop Sports Phys Ther"},{"key":"B5","doi-asserted-by":"publisher","first-page":"72","DOI":"10.1186\/s12911-024-02459-6","article-title":"Assessing the research landscape and clinical utility of large language models: a scoping review","volume":"24","author":"Park","year":"2024","journal-title":"BMC Med Inform Decis Mak"},{"key":"B6","doi-asserted-by":"publisher","first-page":"e0296151","DOI":"10.1371\/journal.pone.0296151","article-title":"Exploring factors influencing user perspective of ChatGPT as a technology that assists in healthcare decision making: a cross sectional survey study","volume":"19","author":"Choudhury","year":"2024","journal-title":"PLoS One"},{"key":"B7","doi-asserted-by":"publisher","first-page":"1169595","DOI":"10.3389\/frai.2023.1169595","article-title":"ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations","volume":"6","author":"Dave","year":"2023","journal-title":"Front Artif Intell"},{"key":"B8","doi-asserted-by":"publisher","first-page":"e49368","DOI":"10.2196\/49368","article-title":"A SWOT (strengths, weaknesses, opportunities, and threats) analysis of ChatGPT in the medical literature: concise review","volume":"25","author":"G\u00f6dde","year":"2023","journal-title":"J Med Internet Res"},{"key":"B9","doi-asserted-by":"publisher","first-page":"104620","DOI":"10.1016\/j.jbi.2024.104620","article-title":"Evaluation of ChatGPT-generated medical responses: a systematic review and meta-analysis","volume":"151","author":"Wei","year":"2024","journal-title":"J Biomed Inform"},{"key":"B10","doi-asserted-by":"publisher","first-page":"4182","DOI":"10.1007\/s00586-024-08198-6","article-title":"ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis","volume":"33","author":"Ahmed","year":"2024","journal-title":"Eur Spine J"},{"key":"B11","doi-asserted-by":"publisher","first-page":"149","DOI":"10.14245\/ns.2347052.526","article-title":"Use of ChatGPT for determining clinical and surgical treatment of lumbar disc herniation with radiculopathy: a north American spine society guideline comparison","volume":"21","author":"Mejia","year":"2024","journal-title":"Neurospine"},{"key":"B12","doi-asserted-by":"publisher","first-page":"640","DOI":"10.1097\/BRS.0000000000004915","article-title":"Performance of ChatGPT on NASS clinical guidelines for the diagnosis and treatment of low back pain: a comparison study","volume":"49","author":"Shrestha","year":"2024","journal-title":"Spine"},{"key":"B13","doi-asserted-by":"publisher","first-page":"222","DOI":"10.2519\/jospt.2024.12151","article-title":"Performance of ChatGPT compared to clinical practice guidelines in making informed decisions for lumbosacral radicular pain: a cross-sectional study","volume":"54","author":"Gianola","year":"2024","journal-title":"J Orthop Sports Phys Ther"},{"key":"B14","doi-asserted-by":"publisher","first-page":"2482","DOI":"10.3390\/jcm10112482","article-title":"Recommendations for diagnosis and treatment of lumbosacral radicular pain: a systematic review of clinical practice guidelines","volume":"10","author":"Khorami","year":"2021","journal-title":"J Clin Med"},{"key":"B15","doi-asserted-by":"publisher","first-page":"344","DOI":"10.1016\/j.jclinepi.2007.11.008","article-title":"The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies","volume":"61","author":"von Elm","year":"2008","journal-title":"J Clin Epidemiol"},{"key":"B16","doi-asserted-by":"publisher","first-page":"924","DOI":"10.1038\/s41591-022-01772-9","article-title":"Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI","volume":"28","author":"Vasey","year":"2022","journal-title":"Nat Med"},{"key":"B17","doi-asserted-by":"publisher","first-page":"b450","DOI":"10.1136\/bmj.b450","article-title":"Guide to ethical approval","volume":"338","author":"Nowell","year":"2009","journal-title":"Br Med J"},{"key":"B18","article-title":"Institute of Medicine (US) Committee on Standards for Developing Trustworthy Clinical Practice Guidelines. Clinical Practice Guidelines We Can Trust","author":"Graham","year":"2011"},{"key":"B19","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/s41073-022-00122-0","article-title":"ACCORD guideline for reporting consensus-based methods in biomedical research and clinical practice: a study protocol","volume":"7","author":"Gattrell","year":"2022","journal-title":"Res Integr Peer Rev"},{"key":"B20","article-title":"Hello GPT-4o","year":""},{"key":"B21","article-title":"Microsoft Copilot: il tuo AI Companion quotidiano. Microsoft Copilot: il tuo AI Companion quotidiano","year":""},{"key":"B22","article-title":"Gemini: chatta per espandere le tue idee. Gemini","year":""},{"key":"B23","article-title":"Claude","year":""},{"key":"B24","article-title":"Perplexity","year":""},{"key":"B25","doi-asserted-by":"publisher","first-page":"737","DOI":"10.1016\/j.jclinepi.2010.02.006","article-title":"The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes","volume":"63","author":"Mokkink","year":"2010","journal-title":"J Clin Epidemiol"},{"key":"B26","article-title":"The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. 1573\u20132649 (Electronic)","author":"Mokkink","year":""},{"key":"B27","article-title":"Plagiarism checker X - text similarity detector. Plagiarism checker X","year":""},{"key":"B28","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1007\/s00769-006-0191-z","article-title":"Understanding the meaning of accuracy, trueness and precision","volume":"12","author":"Menditto","year":"2007","journal-title":"Accredit Qual Assur"},{"key":"B29","doi-asserted-by":"publisher","first-page":"2629","DOI":"10.1007\/s10439-023-03272-4","article-title":"Prompt engineering with ChatGPT: a guide for academic writers","volume":"51","author":"Giray","year":"2023","journal-title":"Ann Biomed Eng"},{"key":"B30","volume-title":"SPSS for Windows Step by Step: A Simple Guide and Reference, 17.0 Update","author":"George","year":"2010"},{"key":"B31","first-page":"80","volume-title":"Biostatistics: The Bare Essentials","author":"Norman","year":"2008"},{"key":"B32","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","article-title":"The measurement of observer agreement for categorical data","volume":"33","author":"Landis","year":"1977","journal-title":"Biometrics"},{"key":"B33","doi-asserted-by":"publisher","first-page":"e57795","DOI":"10.7759\/cureus.57795","article-title":"Redefining healthcare with artificial intelligence (AI): the contributions of ChatGPT, Gemini, and co-pilot","volume":"16","author":"Alhur","year":"2024","journal-title":"Cureus"},{"key":"B34","doi-asserted-by":"publisher","first-page":"457","DOI":"10.20517\/ir.2024.27","article-title":"A survey of datasets in medicine for large language models","volume":"4","author":"Zhang","year":"2024","journal-title":"Intell Robot"},{"key":"B35","doi-asserted-by":"publisher","first-page":"e106","DOI":"10.2196\/jmir.4126","article-title":"Access to care and use of the internet to search for health information: results from the US national health interview survey","volume":"17","author":"Amante","year":"2015","journal-title":"J Med Internet Res"},{"key":"B36","doi-asserted-by":"publisher","first-page":"e0228786","DOI":"10.1371\/journal.pone.0228786","article-title":"Situating Wikipedia as a health information resource in various contexts: a scoping review","volume":"15","author":"Smith","year":"2020","journal-title":"PLoS One"},{"key":"B37","doi-asserted-by":"publisher","first-page":"e262","DOI":"10.2196\/jmir.3706","article-title":"Dr Google and the consumer: a qualitative study exploring the navigational needs and online health information-seeking behaviors of consumers with chronic health conditions","volume":"16","author":"Lee","year":"2014","journal-title":"J Med Internet Res"},{"key":"B38","doi-asserted-by":"publisher","first-page":"355","DOI":"10.23736\/S0390-5616.20.05243-1","article-title":"What is the quality of the information available on the internet for patients suffering with sciatica?","volume":"67","author":"Mancuso-Marcello","year":"2023","journal-title":"J Neurosurg Sci"},{"key":"B39","doi-asserted-by":"publisher","first-page":"1166120","DOI":"10.3389\/fpubh.2023.1166120","article-title":"ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health","volume":"11","author":"De Angelis","year":"2023","journal-title":"Front Public Health"},{"key":"B40","doi-asserted-by":"publisher","first-page":"2320","DOI":"10.1007\/s00464-024-10807-w","article-title":"The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease","volume":"38","author":"Huo","year":"2024","journal-title":"Surg Endosc"},{"key":"B41","doi-asserted-by":"publisher","first-page":"e40895","DOI":"10.7759\/cureus.40895","article-title":"Chatdoctor: a medical chat model fine-tuned on a large language model meta-AI (LLaMA) using medical domain knowledge","volume":"15","author":"Li","year":"2023","journal-title":"Cureus"},{"key":"B42","doi-asserted-by":"publisher","DOI":"10.1056\/AIoa2300068","article-title":"Almanac - retrieval-augmented language models for clinical medicine","volume":"1","author":"Zakka","year":"2024","journal-title":"NEJM AI"},{"key":"B43","doi-asserted-by":"publisher","first-page":"1833","DOI":"10.1093\/jamia\/ocae045","article-title":"PMC-LLaMA: toward building open-source language models for medicine","volume":"31","author":"Wu","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"B44","doi-asserted-by":"publisher","first-page":"572","DOI":"10.1186\/s12909-025-07176-w","article-title":"Knowledge and use, perceptions of benefits and limitations of artificial intelligence chatbots among Italian physiotherapy students: a cross-sectional national study","volume":"25","author":"Tortella","year":"2025","journal-title":"BMC Med Educ"},{"key":"B45","doi-asserted-by":"publisher","first-page":"397","DOI":"10.23736\/S2784-8469.24.04517-6","article-title":"Artificial intelligence chatbots in musculoskeletal rehabilitation: change is knocking at the door","volume":"75","author":"Rossettini","year":"2024","journal-title":"Minerva Orthop"},{"key":"B46","article-title":"DeepSeek","year":""},{"key":"B47","doi-asserted-by":"publisher","first-page":"2988","DOI":"10.1038\/s41591-023-02656-2","article-title":"Reporting standards for the use of large language model-linked chatbots for health advice","volume":"29","author":"Huo","year":"2023","journal-title":"Nat Med"}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1574287\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T05:33:16Z","timestamp":1751002396000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1574287\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,27]]},"references-count":47,"alternative-id":["10.3389\/fdgth.2025.1574287"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2025.1574287","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,27]]},"article-number":"1574287"}}