{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T05:12:20Z","timestamp":1773378740815,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2025,12,23]],"date-time":"2025-12-23T00:00:00Z","timestamp":1766448000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Objectives<\/jats:title>\n                    <jats:p>To develop AutoReporter, a large language model (LLM) system that automates evaluation of adherence to research reporting guidelines.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Materials and Methods<\/jats:title>\n                    <jats:p>Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT\u2013CONSORT\u2013TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>AutoReporter, a zero-shot, no-retrieval prompt coupled with the o3-mini reasoning LLM, demonstrated strong accuracy (CONSORT 90.09%; SPIRIT: 92.07%), substantial agreement with humans (CONSORT Cohen\u2019s \u03ba\u2009=\u20090.70, SPIRIT Cohen\u2019s \u03ba\u2009=\u20090.77), runtime (CONSORT: 617.26\u00a0s; SPIRIT: 544.51\u00a0s), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen\u2019s \u03ba\u2009&amp;gt;\u20090.6) with expert ratings from the BenchReport benchmark.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion<\/jats:title>\n                    <jats:p>Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>Large language models can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https:\/\/autoreporter.streamlit.app.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/jamia\/ocaf223","type":"journal-article","created":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T12:55:04Z","timestamp":1765284904000},"page":"724-731","source":"Crossref","is-referenced-by-count":0,"title":["AutoReporter: development of an artificial intelligence tool for automated assessment of research reporting guideline adherence"],"prefix":"10.1093","volume":"33","author":[{"given":"David","family":"Chen","sequence":"first","affiliation":[{"name":"Princess Margaret Cancer Centre, Radiation Medicine Program , Toronto, ON M5G 2C4,","place":["Canada"]},{"name":"Temerty Faculty of Medicine, University of Toronto , Toronto, ON M5S 3K3,","place":["Canada"]}]},{"given":"Patrick","family":"Li","sequence":"additional","affiliation":[{"name":"Faculty of Engineering, McMaster University , Hamilton, ON L8S 4M3,","place":["Canada"]}]},{"given":"Ealia","family":"Khoshkish","sequence":"additional","affiliation":[{"name":"Temerty Faculty of Medicine, University of Toronto , Toronto, ON M5S 3K3,","place":["Canada"]}]},{"given":"Seungmin","family":"Lee","sequence":"additional","affiliation":[{"name":"Temerty Faculty of Medicine, University of Toronto , Toronto, ON M5S 3K3,","place":["Canada"]}]},{"given":"Tony","family":"Ning","sequence":"additional","affiliation":[{"name":"Temerty Faculty of Medicine, University of Toronto , Toronto, ON M5S 3K3,","place":["Canada"]}]},{"given":"Umair","family":"Tahir","sequence":"additional","affiliation":[{"name":"Temerty Faculty of Medicine, University of Toronto , Toronto, ON M5S 3K3,","place":["Canada"]}]},{"given":"Henry C Y","family":"Wong","sequence":"additional","affiliation":[{"name":"Department of Oncology, Princess Margaret Hospital , Hong Kong,","place":["China"]}]},{"given":"Michael S F","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Radiation Oncology, National University Cancer Institute, National University Hospital , Singapore,","place":["Singapore"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5688-9628","authenticated-orcid":false,"given":"Srinivas","family":"Raman","sequence":"additional","affiliation":[{"name":"Department of Radiation Oncology, BC Cancer , Vancouver, BC V5Z 4E6,","place":["Canada"]},{"name":"Division of Radiation Oncology, University of British Columbia , Vancouver, BC V5Z 1M9,","place":["Canada"]}]}],"member":"286","published-online":{"date-parts":[[2025,12,23]]},"reference":[{"key":"2026031216461398600_ocaf223-B1","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1038\/533452a","article-title":"1,500 scientists lift the lid on reproducibility","volume":"533","author":"Baker","year":"2016","journal-title":"Nature"},{"key":"2026031216461398600_ocaf223-B2","doi-asserted-by":"crossref","first-page":"429","DOI":"10.7326\/0003-4819-157-6-201209180-00537","article-title":"Influence of reported study design characteristics on intervention effect estimates from randomized, controlled trials","volume":"157","author":"Savovi\u0107","year":"2012","journal-title":"Ann Intern Med"},{"key":"2026031216461398600_ocaf223-B3","doi-asserted-by":"crossref","first-page":"c869","DOI":"10.1136\/bmj.c869","article-title":"CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials","volume":"340","author":"Moher","year":"2010","journal-title":"BMJ"},{"key":"2026031216461398600_ocaf223-B4","doi-asserted-by":"crossref","first-page":"200","DOI":"10.7326\/0003-4819-158-3-201302050-00583","article-title":"SPIRIT 2013 statement: defining standard protocol items for clinical trials","volume":"158","author":"Chan","year":"2013","journal-title":"Ann Intern Med"},{"key":"2026031216461398600_ocaf223-B5","doi-asserted-by":"crossref","first-page":"263","DOI":"10.5694\/j.1326-5377.2006.tb00557.x","article-title":"Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review","volume":"185","author":"Plint","year":"2006","journal-title":"Med J Aust"},{"key":"2026031216461398600_ocaf223-B6","doi-asserted-by":"crossref","first-page":"e038283","DOI":"10.1136\/bmjopen-2020-038283","article-title":"Has the reporting quality of published randomised controlled trial protocols improved since the SPIRIT statement? A methodological study","volume":"10","author":"Tan","year":"2020","journal-title":"BMJ Open"},{"key":"2026031216461398600_ocaf223-B7","doi-asserted-by":"crossref","first-page":"1619","DOI":"10.1038\/s41467-024-45355-3","article-title":"Concordance of randomised controlled trials for artificial intelligence interventions with the CONSORT-AI reporting guidelines","volume":"15","author":"Martindale","year":"2024","journal-title":"Nat Commun"},{"key":"2026031216461398600_ocaf223-B8","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.jclinepi.2019.05.030","article-title":"Consolidated Standards of Reporting Trials (CONSORT) extensions covered most types of randomized controlled trials, but the potential workload for authors was high","volume":"113","author":"Ghosn","year":"2019","journal-title":"J Clin Epidemiol"},{"key":"2026031216461398600_ocaf223-B9","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1186\/s13063-018-2475-0","article-title":"Are CONSORT checklists submitted by authors adequately reflecting what information is actually reported in published papers?","volume":"19","author":"Blanco","year":"2018","journal-title":"Trials"},{"key":"2026031216461398600_ocaf223-B10","doi-asserted-by":"crossref","first-page":"e036799","DOI":"10.1136\/bmjopen-2020-036799","article-title":"Effect of an editorial intervention to improve the completeness of reporting of randomised trials: a randomised controlled trial","volume":"10","author":"Blanco","year":"2020","journal-title":"BMJ Open"},{"key":"2026031216461398600_ocaf223-B11","doi-asserted-by":"crossref","first-page":"e2014661","DOI":"10.1001\/jamanetworkopen.2020.14661","article-title":"Development and validation of a natural language processing tool to generate the CONSORT reporting checklist for randomized clinical trials","volume":"3","author":"Wang","year":"2020","journal-title":"JAMA Netw Open"},{"key":"2026031216461398600_ocaf223-B12","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1016\/j.jclinepi.2020.11.003","article-title":"Machine learning reduced workload with minimal risk of missing studies: development and evaluation of a randomized controlled trial classifier for Cochrane Reviews","volume":"133","author":"Thomas","year":"2021","journal-title":"J Clin Epidemiol"},{"key":"2026031216461398600_ocaf223-B13","doi-asserted-by":"crossref","first-page":"1406","DOI":"10.1109\/JBHI.2015.2431314","article-title":"Automating risk of bias assessment for clinical trials","volume":"19","author":"Marshall","year":"2015","journal-title":"IEEE J Biomed Health Inform"},{"key":"2026031216461398600_ocaf223-B14","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1001\/jamainternmed.2023.1838","article-title":"Comparing physician and artificial intelligence Chatbot responses to patient questions posted to a public social media forum","volume":"183","author":"Ayers","year":"2023","journal-title":"JAMA Intern Med"},{"key":"2026031216461398600_ocaf223-B15","doi-asserted-by":"crossref","first-page":"956","DOI":"10.1001\/jamaoncol.2024.0836","article-title":"Physician and artificial intelligence Chatbot responses to cancer questions from social media","volume":"10","author":"Chen","year":"2024","journal-title":"JAMA Oncol"},{"key":"2026031216461398600_ocaf223-B16","doi-asserted-by":"crossref","first-page":"103717","DOI":"10.1016\/j.jbi.2021.103717","article-title":"Toward assessing clinical trial publications for reporting transparency","volume":"116","author":"Kilicoglu","year":"2021","journal-title":"J Biomed Inform"},{"key":"2026031216461398600_ocaf223-B17","doi-asserted-by":"crossref","first-page":"21721","DOI":"10.1038\/s41598-024-72130-7","article-title":"Text classification models for assessing the completeness of randomized controlled trial publications based on CONSORT reporting guidelines","volume":"14","author":"Jiang","year":"2024","journal-title":"Sci Rep"},{"key":"2026031216461398600_ocaf223-B18","doi-asserted-by":"crossref","first-page":"e2529418","DOI":"10.1001\/jamanetworkopen.2025.29418","article-title":"Large language model analysis of reporting quality of randomized clinical trial articles: a systematic review","volume":"8","author":"Srinivasan","year":"2025","journal-title":"JAMA Netw Open"},{"key":"2026031216461398600_ocaf223-B19","doi-asserted-by":"crossref","first-page":"1340","DOI":"10.1093\/jamia\/ocaf093","article-title":"RAPID: Reliable and efficient Automatic generation of submission rePortIng checklists with large language moDels","volume":"32","author":"Li","year":"2025","journal-title":"J Am Med Inform Assoc"},{"key":"2026031216461398600_ocaf223-B20","doi-asserted-by":"crossref","first-page":"e088735","DOI":"10.1136\/bmjopen-2024-088735","article-title":"GPT for RCTs? Using AI to determine adherence to clinical trial reporting guidelines","volume":"15","author":"Wrightson","year":"2025","journal-title":"BMJ Open"},{"key":"2026031216461398600_ocaf223-B21","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1038\/s41597-025-04629-1","article-title":"SPIRIT-CONSORT-TM: a corpus for assessing transparency of clinical trial protocol and results publications","volume":"12","author":"Jiang","year":"2025","journal-title":"Sci Data"},{"key":"2026031216461398600_ocaf223-B22","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.jclinepi.2021.09.012","article-title":"Reporting transparency and completeness in trials: paper 2\u2014reporting of randomised trials using registries was often inadequate and hindered the interpretation of results","volume":"141","author":"Mc Cord","year":"2022","journal-title":"J Clin Epidemiol"},{"key":"2026031216461398600_ocaf223-B23","doi-asserted-by":"crossref","first-page":"e200342","DOI":"10.1148\/ryct.2020200342","article-title":"Suboptimal quality and high risk of bias in diagnostic test accuracy studies at chest radiography and CT in the acute setting of the COVID-19 pandemic: a systematic review","volume":"2","author":"Such\u00e1","year":"2020","journal-title":"Radiol Cardiothorac Imaging"},{"key":"2026031216461398600_ocaf223-B24","first-page":"1","article-title":"The quality of methodological and reporting in network meta-analysis of acupuncture and moxibustion: a cross-sectional survey","volume":"2021","author":"Yuan","year":"2021","journal-title":"Evid Based Complement Alternat Med"},{"key":"2026031216461398600_ocaf223-B25","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1186\/s12874-021-01223-y","article-title":"There is still room for improvement in the completeness of abstract reporting according to the PRISMA-A checklist: a cross-sectional study on systematic reviews in periodontology","volume":"21","author":"Adobes Martin","year":"2021","journal-title":"BMC Med Res Methodol"},{"key":"2026031216461398600_ocaf223-B26","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/j.ijid.2020.09.1470","article-title":"Use of hydroxychloroquine and chloroquine in COVID-19: how good is the quality of randomized controlled trials?","volume":"101","author":"Mazhar","year":"2020","journal-title":"Int J Infect Dis"},{"key":"2026031216461398600_ocaf223-B27","doi-asserted-by":"crossref","first-page":"884573","DOI":"10.3389\/fmed.2022.884573","article-title":"Use of traditional, complementary and integrative medicine during the COVID-19 pandemic: a systematic review and meta-analysis","volume":"9","author":"Kim","year":"2022","journal-title":"Front Med (Lausanne)"},{"key":"2026031216461398600_ocaf223-B28","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1186\/s12954-021-00570-9","article-title":"Reporting and methodological quality of systematic literature reviews evaluating the associations between e-cigarette use and cigarette smoking behaviors: a systematic quality review","volume":"18","author":"Kim","year":"2021","journal-title":"Harm Reduct J"},{"key":"2026031216461398600_ocaf223-B29","doi-asserted-by":"crossref","first-page":"1037","DOI":"10.1177\/02698811211032445","article-title":"Pharmacological treatment for Tourette syndrome in children and adults: what is the quality of the evidence? A systematic review","volume":"35","author":"Besag","year":"2021","journal-title":"J Psychopharmacol"},{"key":"2026031216461398600_ocaf223-B30","author":"Hurst","year":"2024"},{"key":"2026031216461398600_ocaf223-B31","author":"Jaech","year":"2024"},{"key":"2026031216461398600_ocaf223-B32","doi-asserted-by":"crossref","first-page":"276","DOI":"10.11613\/BM.2012.031","article-title":"Interrater reliability: the kappa statistic","volume":"22","author":"McHugh","year":"2012","journal-title":"Biochem Med"},{"key":"2026031216461398600_ocaf223-B33","author":"Kojima","year":"2022"},{"key":"2026031216461398600_ocaf223-B34","doi-asserted-by":"crossref","DOI":"10.1056\/AIcs2400360","article-title":"Zero-shot clinical trial patient matching with LLMs","volume":"2","author":"Wornow","year":"2025","journal-title":"NEJM AI"},{"key":"2026031216461398600_ocaf223-B35","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1001\/jamainternmed.2024.0295","article-title":"Clinical reasoning of a generative artificial intelligence model compared with physicians","volume":"184","author":"Cabral","year":"2024","journal-title":"JAMA Intern Med"},{"key":"2026031216461398600_ocaf223-B36","author":"Pawar","year":"2024"},{"key":"2026031216461398600_ocaf223-B37","doi-asserted-by":"crossref","first-page":"389","DOI":"10.7326\/ANNALS-24-02189","article-title":"Development of prompt templates for large language model-driven screening in systematic reviews","volume":"178","author":"Cao","year":"2025","journal-title":"Ann Intern Med"},{"key":"2026031216461398600_ocaf223-B38","doi-asserted-by":"crossref","first-page":"14156","DOI":"10.1038\/s41598-024-64827-6","article-title":"OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models","volume":"14","author":"Maharjan","year":"2024","journal-title":"Sci Rep"},{"key":"2026031216461398600_ocaf223-B39","first-page":"478","article-title":"Comparison of prompt engineering and fine-tuning strategies in large language models in the classification of clinical notes","volume":"2024","author":"Zhang","year":"2024","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"key":"2026031216461398600_ocaf223-B40","doi-asserted-by":"crossref","first-page":"1812","DOI":"10.1093\/jamia\/ocad259","article-title":"Improving large language models for clinical named entity recognition via prompt engineering","volume":"31","author":"Hu","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"2026031216461398600_ocaf223-B41","author":"Huang","year":"2022"},{"key":"2026031216461398600_ocaf223-B42","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s12874-019-0782-0","article-title":"Single screening versus conventional double screening for study selection in systematic reviews: a methodological systematic review","volume":"19","author":"Waffenschmidt","year":"2019","journal-title":"BMC Med Res Methodol"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/advance-article-pdf\/doi\/10.1093\/jamia\/ocaf223\/66119252\/ocaf223.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/33\/3\/724\/66119252\/ocaf223.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/33\/3\/724\/66119252\/ocaf223.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T20:46:20Z","timestamp":1773348380000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/33\/3\/724\/8403434"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,23]]},"references-count":42,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,12,23]]},"published-print":{"date-parts":[[2026,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaf223","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,3]]},"published":{"date-parts":[[2025,12,23]]}}}