{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T13:38:57Z","timestamp":1781357937113,"version":"3.54.1"},"reference-count":24,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T00:00:00Z","timestamp":1745884800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Digit. Health"],"abstract":"<jats:sec><jats:title>Background<\/jats:title><jats:p>Concise synopses of clinical evidence support treatment decision-making but are time-consuming to curate. Large language models (LLMs) offer potential but they may provide inaccurate information. We objectively assessed the abilities of four commercially available LLMs to generate synopses for six treatment regimens in multiple myeloma and amyloid light chain (AL) amyloidosis.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>We compared the performance of four LLMs: Claude 3.5, ChatGPT 4.0; Gemini 1.0 and Llama-3.1. Each LLM was prompted to write synopses for six regimens. Two hematologists independently assessed accuracy, completeness, relevance, clarity, coherence, and hallucinations using Likert scales. Mean scores with 95% confidence intervals (CI) were calculated across all domains and inter-rater reliability was evaluated using Cohen's quadratic weighted kappa.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Claude demonstrated the highest performance in all domains, outperforming the other LLMs in accuracy: mean Likert score 3.92 (95% CI 3.54\u20134.29); ChatGPT 3.25 (2.76\u20133.74); Gemini 3.17 (2.54\u20133.80); Llama 1.92 (1.41\u20132.43);completeness: mean Likert score 4.00 (3.66\u20134.34); GPT 2.58 (2.02\u20133.15); Gemini 2.58 (2.02\u20133.15); Llama 1.67 (1.39\u20131.95); and extentofhallucinations: mean Likert score 4.00 (4.00\u20134.00); ChatGPT 2.75 (2.06-3.44); Gemini 3.25 (2.65\u20133.85); Llama 1.92 (1.26\u20132.57). Llama performed considerably poorer across all the studied domains. ChatGPT and Gemini had intermediate performance. Notably, none of the LLMs registered perfect accuracy, completeness, or relevance.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>Claude performed at a consistently higher level than other LLMs, all tested LLMs required careful editing from a domain expert to become usable. More time will be needed to determine the suitability of LLMsto independently generate clinical synopses.<\/jats:p><\/jats:sec>","DOI":"10.3389\/fdgth.2025.1569554","type":"journal-article","created":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T05:25:22Z","timestamp":1745904322000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Summarizing clinical evidence utilizing large language models for cancer treatments: a blinded comparative analysis"],"prefix":"10.3389","volume":"7","author":[{"given":"Samuel","family":"Rubinstein","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Aleenah","family":"Mohsin","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rahul","family":"Banerjee","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Will","family":"Ma","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sanjay","family":"Mishra","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mary","family":"Kwok","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peter","family":"Yang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jeremy L.","family":"Warner","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andrew J.","family":"Cowan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2025,4,29]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1186\/1748-5908-7-60","article-title":"Developing clinical practice guidelines: target audiences, identifying topics for guidelines, guideline group composition and functioning and conflicts of interest","volume":"7","author":"Eccles","year":"2012","journal-title":"Implement Sci"},{"key":"B2","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1186\/s13643-024-02575-4","article-title":"Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain","volume":"13","author":"Dennst\u00e4dt","year":"2024","journal-title":"Syst Rev"},{"key":"B3","doi-asserted-by":"publisher","first-page":"1302","DOI":"10.1681\/ASN.0000000000000166","article-title":"Retrieve, summarize, and verify: how will ChatGPT affect information seeking from the medical literature?","volume":"34","author":"Jin","year":"2023","journal-title":"J Am Soc Nephrol"},{"key":"B4","doi-asserted-by":"publisher","first-page":"1134","DOI":"10.1038\/s41591-024-02855-5","article-title":"Adapted large language models can outperform medical experts in clinical text summarization","volume":"30","author":"Veen","year":"2024","journal-title":"Nat Med"},{"key":"B5","doi-asserted-by":"publisher","first-page":"2294","DOI":"10.1093\/jamia\/ocae186","article-title":"Leveraging artificial intelligence to summarize abstracts in lay language for increasing research accessibility and transparency","volume":"31","author":"Shyr","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"B6","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1370\/afm.3075","article-title":"Quality, accuracy, and bias in ChatGPT-based summarization of medical abstracts","volume":"22","author":"Hake","year":"2024","journal-title":"Ann Fam Med"},{"key":"B7","doi-asserted-by":"publisher","first-page":"210","DOI":"10.7326\/M23-2772","article-title":"Large language models in medicine: the potentials and pitfalls: a narrative review","volume":"177","author":"Omiye","year":"2024","journal-title":"Ann Intern Med"},{"key":"B8","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1038\/s41746-023-00896-7","article-title":"Evaluating large language models on medical evidence summarization","volume":"6","author":"Tang","year":"2023","journal-title":"npj Digit Med"},{"key":"B9","doi-asserted-by":"publisher","first-page":"1597","DOI":"10.1056\/NEJMoa2400712","article-title":"Isatuximab, bortezomib, lenalidomide, and dexamethasone for multiple myeloma","volume":"391","author":"Facon","year":"2024","journal-title":"N Engl J Med"},{"key":"B10","doi-asserted-by":"publisher","first-page":"e336","DOI":"10.1200\/JOP.2014.001511","article-title":"Hemonc.org: a collaborative online knowledge platform for oncology professionals","volume":"11","author":"Warner","year":"2015","journal-title":"J Oncol Pract"},{"key":"B11","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1038\/s41408-024-01088-6","article-title":"Daratumumab in transplant-eligible patients with newly diagnosed multiple myeloma: final analysis of clinically relevant subgroups in GRIFFIN","volume":"14","author":"Chari","year":"2024","journal-title":"Blood Cancer J"},{"key":"B12","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1056\/NEJMoa2312054","article-title":"Daratumumab, bortezomib, lenalidomide, and dexamethasone for multiple myeloma","volume":"390","author":"Sonneveld","year":"2024","journal-title":"N Engl J Med"},{"key":"B13","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1056\/NEJMoa1411321","article-title":"Carfilzomib, lenalidomide, and dexamethasone for relapsed multiple myeloma","volume":"372","author":"Stewart","year":"2015","journal-title":"N Engl J Med"},{"key":"B14","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1111\/j.1365-2141.2007.06639.x","article-title":"Incorporating bortezomib into upfront treatment for multiple myeloma: early results of total therapy 3","volume":"138","author":"Barlogie","year":"2007","journal-title":"Br J Haematol"},{"key":"B15","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1056\/NEJMoa2028631","article-title":"Daratumumab-based treatment for immunoglobulin light-chain amyloidosis","volume":"385","author":"Kastritis","year":"2021","journal-title":"N Engl J Med"},{"key":"B16","doi-asserted-by":"publisher","first-page":"2259","DOI":"10.1038\/s41591-023-02528-9","article-title":"Elranatamab in relapsed or refractory multiple myeloma: phase 2 magnetisMM-3 trial results","volume":"29","author":"Lesokhin","year":"2023","journal-title":"Nat Med"},{"key":"B17","doi-asserted-by":"publisher","first-page":"2232","DOI":"10.1056\/NEJMoa2204591","article-title":"Talquetamab, a T-cell-redirecting GPRC5D bispecific antibody for multiple myeloma","volume":"387","author":"Chari","year":"2022","journal-title":"N Engl J Med"},{"key":"B18","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1016\/j.jbi.2008.08.010","article-title":"Research electronic data capture (REDCap)\u2014a metadata-driven methodology and workflow process for providing translational research informatics support","volume":"42","author":"Harris","year":"2009","journal-title":"J Biomed Inform"},{"key":"B19","doi-asserted-by":"publisher","first-page":"577","DOI":"10.1001\/archinternmed.2011.96","article-title":"Conflicts of interest in cardiovascular clinical practice guidelines","volume":"171","author":"Mendelson","year":"2011","journal-title":"Arch Intern Med"},{"key":"B20","doi-asserted-by":"publisher","first-page":"m4234","DOI":"10.1136\/bmj.m4234","article-title":"Association between conflicts of interest and favourable recommendations in clinical guidelines, advisory committee reports, opinion pieces, and narrative reviews: systematic review","volume":"371","author":"Nejstgaard","year":"2020","journal-title":"Br Med J"},{"key":"B21","volume-title":"arXiv","author":"Wu","year":"2024"},{"key":"B22","first-page":"1","article-title":"Comparative analysis of ChatGPT-4 and LLaMA: performance evaluation on text summarization, data analysis, and question answering","author":"Bogireddy","year":"2024"},{"key":"B23","volume-title":"arXiv","author":"Janakiraman","year":"2025"},{"key":"B24","volume-title":"arXiv","author":"Li","year":"2024"}],"container-title":["Frontiers in Digital Health"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1569554\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T05:25:23Z","timestamp":1745904323000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdgth.2025.1569554\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,29]]},"references-count":24,"alternative-id":["10.3389\/fdgth.2025.1569554"],"URL":"https:\/\/doi.org\/10.3389\/fdgth.2025.1569554","relation":{},"ISSN":["2673-253X"],"issn-type":[{"value":"2673-253X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,29]]},"article-number":"1569554"}}