{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T22:54:16Z","timestamp":1781736856639,"version":"3.54.5"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2024,7,4]],"date-time":"2024-07-04T00:00:00Z","timestamp":1720051200000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1741306"],"award-info":[{"award-number":["IIS-1741306"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-2235548"],"award-info":[{"award-number":["IIS-2235548"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000005","name":"Department of Defense","doi-asserted-by":"publisher","award":["DoD W91XWH-05-1-023"],"award-info":[{"award-number":["DoD W91XWH-05-1-023"]}],"id":[{"id":"10.13039\/100000005","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objectives<\/jats:title>\n                  <jats:p>To investigate approaches of reasoning with large language models (LLMs) and to propose a new prompting approach, ensemble reasoning, to improve medical question answering performance with refined reasoning and reduced inconsistency.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>We used multiple choice questions from the USMLE Sample Exam question files on 2 closed-source commercial and 1 open-source clinical LLM to evaluate our proposed approach ensemble reasoning.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>On GPT-3.5 turbo and Med42-70B, our proposed ensemble reasoning approach outperformed zero-shot chain-of-thought with self-consistency on Steps 1, 2, and 3 questions (+3.44%, +4.00%, and +2.54%) and (2.3%, 5.00%, and 4.15%), respectively. With GPT-4 turbo, there were mixed results with ensemble reasoning again outperforming zero-shot chain-of-thought with self-consistency on Step 1 questions (+1.15%). In all cases, the results demonstrated improved consistency of responses with our approach. A qualitative analysis of the reasoning from the model demonstrated that the ensemble reasoning approach produces correct and helpful reasoning.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>The proposed iterative ensemble reasoning has the potential to improve the performance of LLMs in medical question answering tasks, particularly with the less powerful LLMs like GPT-3.5 turbo and Med42-70B, which may suggest that this is a promising approach for LLMs with lower capabilities. Additionally, the findings show that our approach helps to refine the reasoning generated by the LLM and thereby improve consistency even with the more powerful GPT-4 turbo. We also identify the potential and need for human-artificial intelligence teaming to improve the reasoning beyond the limits of the model.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocae131","type":"journal-article","created":{"date-parts":[[2024,7,4]],"date-time":"2024-07-04T01:43:37Z","timestamp":1720057417000},"page":"1964-1975","source":"Crossref","is-referenced-by-count":57,"title":["Reasoning with large language models for medical question answering"],"prefix":"10.1093","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0413-7499","authenticated-orcid":false,"given":"Mary M","family":"Lucas","sequence":"first","affiliation":[{"name":"College of Computing and Informatics, Drexel University , Philadelphia, PA 19104,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Justin","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Maryland , College Park, MD 20742,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jon K","family":"Pomeroy","sequence":"additional","affiliation":[{"name":"College of Computing and Informatics, Drexel University , Philadelphia, PA 19104,","place":["United States"]},{"name":"Penn Medicine , Philadelphia, PA 19104,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Christopher C","family":"Yang","sequence":"additional","affiliation":[{"name":"College of Computing and Informatics, Drexel University , Philadelphia, PA 19104,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2024,7,3]]},"reference":[{"issue":"1","key":"2025070220014691400_ocae131-B1","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1055\/s-0042-1742510","article-title":"Natural language processing: from bedside to everywhere","volume":"31","author":"Aramaki","year":"2022","journal-title":"Yearbook Med Informat"},{"issue":"5","key":"2025070220014691400_ocae131-B2","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","article-title":"Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead","volume":"1","author":"Rudin","year":"2019","journal-title":"Nature Mach Intell"},{"key":"2025070220014691400_ocae131-B3","doi-asserted-by":"crossref","DOI":"10.1007\/s10506-023-09356-9","article-title":"The black box problem revisited. Real and imaginary challenges for automated legal decision making","author":"Bro\u017cek","year":"2024;32:427-440.","journal-title":"Artif Intell Law"},{"key":"2025070220014691400_ocae131-B4","volume-title":"Nat Rev Psychol","author":"Frank","year":"2023"},{"issue":"12","key":"2025070220014691400_ocae131-B5","doi-asserted-by":"crossref","first-page":"e855","DOI":"10.1016\/S2589-7500(23)00202-9","article-title":"Using fine-tuned large language models to parse clinical notes in musculoskeletal pain disorders","volume":"5","author":"Vaid","year":"2023","journal-title":"Lancet Digital Health"},{"issue":"4","key":"2025070220014691400_ocae131-B6","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1016\/S1473-3099(23)00113-5","article-title":"ChatGPT and antimicrobial advice: the end of the consulting infection doctor?","volume":"23","author":"Howard","year":"2023","journal-title":"Lancet Infectious Dis"},{"issue":"7","key":"2025070220014691400_ocae131-B7","doi-asserted-by":"crossref","first-page":"1237","DOI":"10.1093\/jamia\/ocad072","article-title":"Using AI-generated suggestions from ChatGPT to optimize clinical decision support","volume":"30","author":"Liu","year":"2023","journal-title":"J Am Med Informat Assoc"},{"issue":"1","key":"2025070220014691400_ocae131-B8","doi-asserted-by":"crossref","first-page":"e48659","DOI":"10.2196\/48659","article-title":"Assessing the utility of ChatGPT throughout the entire clinical workflow: development and usability study","volume":"25","author":"Rao","year":"2023","journal-title":"J Med Internet Res"},{"issue":"3","key":"2025070220014691400_ocae131-B9","doi-asserted-by":"crossref","first-page":"e107","DOI":"10.1016\/S2589-7500(23)00021-3","article-title":"ChatGPT: the future of discharge summaries?","volume":"5","author":"Patel","year":"2023","journal-title":"Lancet Digital Health"},{"issue":"7","key":"2025070220014691400_ocae131-B10","doi-asserted-by":"crossref","first-page":"e1324","DOI":"10.1002\/ctm2.1324","article-title":"The application of ChatGPT in healthcare progress notes: a commentary from a clinical and research perspective","volume":"13","author":"Nguyen","year":"2023","journal-title":"Clin Transl Med"},{"issue":"6","key":"2025070220014691400_ocae131-B11","doi-asserted-by":"crossref","first-page":"1296","DOI":"10.1038\/s41591-023-02341-4","article-title":"ChatGPT is not the solution to physicians\u2019 documentation burden","volume":"29","author":"Preiksaitis","year":"2023","journal-title":"Nat Med"},{"issue":"3","key":"2025070220014691400_ocae131-B12","doi-asserted-by":"crossref","first-page":"131","DOI":"10.12793\/tcp.2023.31.e16","article-title":"Transforming clinical trials: the emerging roles of large language models","volume":"31","author":"Ghim","year":"2023","journal-title":"Transl Clin Pharmacol"},{"key":"2025070220014691400_ocae131-B13","author":"den Hamer","year":"2023"},{"issue":"1","key":"2025070220014691400_ocae131-B14","doi-asserted-by":"crossref","first-page":"e48291","DOI":"10.2196\/48291","article-title":"Large language models in medical education: opportunities, challenges, and future directions","volume":"9","author":"Abd-Alrazaq","year":"2023","journal-title":"JMIR Med Educ"},{"issue":"1","key":"2025070220014691400_ocae131-B15","doi-asserted-by":"crossref","first-page":"e50945","DOI":"10.2196\/50945","article-title":"The role of large language models in medical education: applications and implications","volume":"9","author":"Safranek","year":"2023","journal-title":"JMIR Med Educ"},{"issue":"2","key":"2025070220014691400_ocae131-B16","doi-asserted-by":"crossref","first-page":"e0000198","DOI":"10.1371\/journal.pdig.0000198","article-title":"Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models","volume":"2","author":"Kung","year":"2023","journal-title":"PLOS Digital Health"},{"issue":"9","key":"2025070220014691400_ocae131-B17","doi-asserted-by":"crossref","first-page":"1558","DOI":"10.1093\/jamia\/ocad104","article-title":"ChatGPT and the clinical informatics board examination: the end of unproctored maintenance of certification?","volume":"30","author":"Kumah-Crystal","year":"2023","journal-title":"J Am Med Informat Assoc"},{"issue":"1","key":"2025070220014691400_ocae131-B18","doi-asserted-by":"crossref","first-page":"e46599","DOI":"10.2196\/46599","article-title":"Trialling a large language model (ChatGPT) in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care","volume":"9","author":"Thirunavukarasu","year":"2023","journal-title":"JMIR Med Educ"},{"key":"2025070220014691400_ocae131-B19","doi-asserted-by":"crossref","first-page":"e45312","DOI":"10.2196\/45312","article-title":"How does ChatGPT perform on the United States Medical Licensing Examination? The implications of large language models for medical education and knowledge assessment","volume":"9","author":"Gilson","year":"2023","journal-title":"JMIR Med Educ"},{"issue":"10","key":"2025070220014691400_ocae131-B20","doi-asserted-by":"crossref","first-page":"e574","DOI":"10.1016\/S2665-9913(23)00216-3","article-title":"Large language models and rheumatology: a comparative evaluation","volume":"5","author":"Venerito","year":"2023","journal-title":"Lancet Rheumatol"},{"key":"2025070220014691400_ocae131-B21","doi-asserted-by":"crossref","first-page":"104770","DOI":"10.1016\/j.ebiom.2023.104770","article-title":"Benchmarking large language models\u2019 performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard","volume":"95","author":"Lim","year":"2023","journal-title":"eBioMedicine"},{"key":"2025070220014691400_ocae131-B22","doi-asserted-by":"crossref","first-page":"131","DOI":"10.18653\/v1\/2023.clinicalnlp-1.17","volume-title":"Proceedings of the 5th Clinical Natural Language Processing Workshop","author":"Chowdhury","year":"2023"},{"key":"2025070220014691400_ocae131-B23","author":"Wei","year":"2023"},{"key":"2025070220014691400_ocae131-B24","author":"Kojima","year":"2023"},{"key":"2025070220014691400_ocae131-B25","author":"Wang","year":"2023"},{"key":"2025070220014691400_ocae131-B26","author":"Yao","year":"2023"},{"key":"2025070220014691400_ocae131-B27","first-page":"2609","author":"Wang","year":"2023"},{"issue":"11","key":"2025070220014691400_ocae131-B28","doi-asserted-by":"crossref","first-page":"2603","DOI":"10.1002\/hbm.21387","article-title":"Stimulating creativity via the exposure to other people\u2019s ideas","volume":"33","author":"Fink","year":"2011","journal-title":"Human Brain Mapping"},{"key":"2025070220014691400_ocae131-B29","author":"Nori"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/9\/1964\/58868063\/ocae131.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/31\/9\/1964\/58868063\/ocae131.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,3]],"date-time":"2025-07-03T00:01:59Z","timestamp":1751500919000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/31\/9\/1964\/7705627"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,3]]},"references-count":29,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2024,7,3]]},"published-print":{"date-parts":[[2024,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocae131","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,9]]},"published":{"date-parts":[[2024,7,3]]}}}