{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T18:19:45Z","timestamp":1776709185407,"version":"3.51.2"},"reference-count":60,"publisher":"Elsevier BV","issue":"4","license":[{"start":{"date-parts":[[2025,4,7]],"date-time":"2025-04-07T00:00:00Z","timestamp":1743984000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,7]],"date-time":"2025-04-07T00:00:00Z","timestamp":1743984000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Manipal Academy of Higher Education, Manipal"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Artif Intell Educ"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The integration of Artificial Intelligence (AI), particularly Chatbot Generative Pre-Trained Transformer (ChatGPT), in medical education has introduced new possibilities for generating various educational resources for assessments. However, ensuring the quality of ChatGPT-generated assessments poses challenges, with limited research in the literature addressing this issue. Recognizing this gap, our study aims to investigate the quality of ChatGPT-based assessment. In this study among first-year medical students, a crossover design was employed to compare scenario-based multiple-choice questions (SBMCQs) crafted by both faculty members and ChatGPT through item analysis to determine the quality of assessment. The study comprised three main phases: development, implementation, and evaluation of SBMCQs. During the development phase, both faculty members and ChatGPT generated 60 SBMCQs each, covering topics related to cardiovascular, respiratory, and endocrinology. These questions underwent assessment by independent reviewers, after which 80 SBMCQs were selected for the tests. Subsequently, in the implementation phase, one hundred and twenty students, divided into two batches, were assigned to receive either faculty-generated or ChatGPT-generated questions across four test sessions. The collected data underwent rigorous item analysis and thematic analysis to evaluate the effectiveness and quality of the questions generated by both parties. Only 9 of ChatGPT\u2019s SBMCQs met ideal criteria MCQ on Difficulty Index, Discrimination Index and Distractor Effectiveness contrasting with 19 from faculty. Moreover, ChatGPT\u2019s questions exhibited a higher rate of nonfunctional distractors (33.75% vs. faculty\u2019s 13.75%). During focus group discussion, faculty highlighted importance of educators in reviewing, refining, and validating ChatGPT-generated SBMCQs to ensure their appropriateness within the educational context.<\/jats:p>","DOI":"10.1007\/s40593-025-00471-z","type":"journal-article","created":{"date-parts":[[2025,4,8]],"date-time":"2025-04-08T09:05:43Z","timestamp":1744103143000},"page":"2315-2344","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Title: Assessing Quality of Scenario-Based Multiple-Choice Questions in Physiology: Faculty-Generated vs. ChatGPT-Generated Questions among Phase I Medical Students"],"prefix":"10.1016","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9117-9151","authenticated-orcid":false,"given":"Archana","family":"Chauhan","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3651-5087","authenticated-orcid":false,"given":"Farah","family":"Khaliq","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6124-2239","authenticated-orcid":false,"given":"Kirtana Raghurama","family":"Nayak","sequence":"additional","affiliation":[]}],"member":"78","published-online":{"date-parts":[[2025,4,7]]},"reference":[{"key":"471_CR1","first-page":"1","volume":"2020","author":"A Abdallah","year":"2020","unstructured":"Abdallah, A., Kasem, M., Hamada, M. A., & Sdeek, S. (2020). Automated question-answer medical model based on deep learning technology. Proceedings of the 6th International Conference on Engineering & MIS, 2020, 1\u20138.","journal-title":"Proceedings of the 6th International Conference on Engineering & MIS"},{"key":"471_CR2","doi-asserted-by":"publisher","first-page":"e40977","DOI":"10.7759\/cureus.40977","volume":"15","author":"M Agarwal","year":"2023","unstructured":"Agarwal, M., Sharma, P., & Goswami, A. (2023). Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple- Choice Questions in Medical Physiology. Cureus, 15, e40977. https:\/\/doi.org\/10.7759\/cureus.40977","journal-title":"Cureus"},{"issue":"10","key":"471_CR3","doi-asserted-by":"publisher","first-page":"139","DOI":"10.3390\/bdcc8100139","volume":"8","author":"S Al Shuraiqi","year":"2024","unstructured":"Al Shuraiqi, S., Aal Abdulsalam, A., Masters, K., Zidoum, H., & AlZaabi, A. (2024). Automatic Generation of Medical Case-Based Multiple-Choice Questions (MCQs): A Review of Methodologies, Applications, Evaluation, and Future Directions. Big Data and Cognitive Computing, 8(10), 139.","journal-title":"Big Data and Cognitive Computing"},{"key":"471_CR4","doi-asserted-by":"publisher","unstructured":"Ayub, I., Hamann, D., Hamann, C. R., & Davis, M. J. (2023). Exploring the potential and limitations of chat generative pre-trained transformer (ChatGPT) in generating board-style dermatology questions: A qualitative analysis. Cureus, 15(8), e43717. https:\/\/doi.org\/10.7759\/cureus.43717","DOI":"10.7759\/cureus.43717"},{"issue":"2","key":"471_CR5","doi-asserted-by":"publisher","first-page":"114","DOI":"10.1016\/j.edurev.2007.06.001","volume":"2","author":"LKJ Baartman","year":"2007","unstructured":"Baartman, L. K. J., Bastiaens, T. J., Kirschner, P. A., & van der Vleuten, C. P. M. (2007). Evaluating assessment quality in competence-based education: A qualitative comparison of two frameworks. Educational Research Review, 2(2), 114\u2013129. https:\/\/doi.org\/10.1016\/j.edurev.2007.06.001","journal-title":"Educational Research Review"},{"key":"471_CR6","doi-asserted-by":"publisher","unstructured":"Badyal, D. K., Jain, A., Lata, H., & Sharma, M. (2023). Triple Cs of scenario-based multiple-choice question: Concept, construction, and corroboration. National Journal of Pharmacology and Therapeutics, 1(1). https:\/\/doi.org\/10.4103\/NJPT.NJPT_16_23","DOI":"10.4103\/NJPT.NJPT_16_23"},{"issue":"3","key":"471_CR7","first-page":"137","volume":"28","author":"DK Badyal","year":"2015","unstructured":"Badyal, D. K., & Singh, T. (2015). Teaching of the basic sciences in medicine: Changing trends. National Medical Journal of India, 28(3), 137\u2013140.","journal-title":"National Medical Journal of India"},{"key":"471_CR8","doi-asserted-by":"publisher","unstructured":"Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big?. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610\u2013623. https:\/\/doi.org\/10.1145\/3442188.3445922","DOI":"10.1145\/3442188.3445922"},{"issue":"2","key":"471_CR9","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1191\/1478088706qp063oa","volume":"3","author":"V Braun","year":"2006","unstructured":"Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77\u2013101. https:\/\/doi.org\/10.1191\/1478088706qp063oa","journal-title":"Qualitative Research in Psychology"},{"issue":"8","key":"471_CR10","doi-asserted-by":"publisher","first-page":"e0290691","DOI":"10.1371\/journal.pone.0290691","volume":"18","author":"BHH Cheung","year":"2023","unstructured":"Cheung, B. H. H., Lau, G. K. K., Wong, G. T. C., Lee, E. Y. P., Kulkarni, D., Seow, C. S., Wong, R., & Co, M. T. H. (2023). ChatGPT versus human in generating medical graduate exam multiple choice questions\u2014A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom). PLOS ONE, 18(8), e0290691. https:\/\/doi.org\/10.1371\/journal.pone.0290691","journal-title":"PLOS ONE"},{"issue":"6","key":"471_CR11","doi-asserted-by":"publisher","first-page":"1876","DOI":"10.18203\/2394-6040.ijcmph20172004","volume":"4","author":"D Christian","year":"2017","unstructured":"Christian, D., Prajapati, A., Rana, B., & Dave, V. (2017). Evaluation of multiple choice questions using item analysis tool: A study from a medical institute of Ahmedabad, Gujarat. International Journal Of Community Medicine And Public Health, 4(6), 1876. https:\/\/doi.org\/10.18203\/2394-6040.ijcmph20172004","journal-title":"International Journal Of Community Medicine And Public Health"},{"issue":"3","key":"471_CR12","doi-asserted-by":"publisher","first-page":"304","DOI":"10.1152\/advan.00140.2015","volume":"40","author":"N Cramer","year":"2016","unstructured":"Cramer, N., Asmar, A., Gorman, L., Gros, B., Harris, D., Howard, T., Hussain, M., Salazar, S., & Kibble, J. D. (2016). Application of a utility analysis to evaluate a novel assessment tool for clinically oriented physiology and pharmacology. Advances in Physiology Education, 40(3), 304\u2013312. https:\/\/doi.org\/10.1152\/advan.00140.2015","journal-title":"Advances in Physiology Education"},{"issue":"5","key":"471_CR13","doi-asserted-by":"publisher","first-page":"1441","DOI":"10.1007\/s10459-023-10225-y","volume":"28","author":"F Falc\u00e3o","year":"2023","unstructured":"Falc\u00e3o, F., Pereira, D. M., Gon\u00e7alves, N., De Champlain, A., Costa, P., & P\u00eago, J. M. (2023). A suggestive approach for assessing item quality, usability and validity of Automatic Item Generation. Advances in Health Sciences Education, 28(5), 1441\u20131465. https:\/\/doi.org\/10.1007\/s10459-023-10225-y","journal-title":"Advances in Health Sciences Education"},{"issue":"2","key":"471_CR14","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1152\/advan.00213.2023","volume":"48","author":"T Favero","year":"2024","unstructured":"Favero, T. (2024). Using Artificial Intelligence Platforms to Support Student Learning in Physiology. Advances in Physiology Education, 48(2), 193\u2013199. https:\/\/doi.org\/10.1152\/advan.00213.2023","journal-title":"Advances in Physiology Education"},{"key":"471_CR15","doi-asserted-by":"publisher","unstructured":"Ferrara, E. (2023). Should chatgpt be biased? challenges and risks of bias in large language models. ArXiv Preprint ArXiv:2304.03738.\u00a0https:\/\/doi.org\/10.5210\/fm.v28i11.13346","DOI":"10.5210\/fm.v28i11.13346"},{"issue":"4","key":"471_CR16","doi-asserted-by":"publisher","first-page":"e157","DOI":"10.1016\/j.jaad.2023.05.054","volume":"89","author":"AL Ferreira","year":"2023","unstructured":"Ferreira, A. L., & Lipoff, J. B. (2023). The complex ethics of applying ChatGPT and language model artificial intelligence in dermatology. Journal of the American Academy of Dermatology, 89(4), e157\u2013e158.","journal-title":"Journal of the American Academy of Dermatology"},{"issue":"6","key":"471_CR17","doi-asserted-by":"publisher","first-page":"1172","DOI":"10.1177\/0013164421992535","volume":"81","author":"RC Foster","year":"2021","unstructured":"Foster, R. C. (2021). KR20 and KR21 for some nondichotomous data (it\u2019s not just Cronbach\u2019s alpha). Educational and Psychological Measurement, 81(6), 1172\u20131202.","journal-title":"Educational and Psychological Measurement"},{"issue":"1","key":"471_CR18","doi-asserted-by":"publisher","first-page":"72","DOI":"10.1080\/10401334.2022.2119569","volume":"36","author":"M Gierl","year":"2024","unstructured":"Gierl, M., Swygert, K., Matovinovic, D., Kulesher, A., & Lai, H. (2024). Three Sources of Validation Evidence Needed to Evaluate the Quality of Generated Test Items for Medical Licensure. Teaching and Learning in Medicine, 36(1), 72\u201382. https:\/\/doi.org\/10.1080\/10401334.2022.2119569","journal-title":"Teaching and Learning in Medicine"},{"key":"471_CR19","doi-asserted-by":"publisher","unstructured":"Gilson, A., Safranek, C., Huang, T., Socrates, V., Chi, L., Taylor, R. A., & Chartash, D. (2022). How does ChatGPT perform on the medical licensing exams? The implications of large language models for medical education and knowledge assessment. MedRxiv, 2012\u20132022. https:\/\/doi.org\/10.1101\/2022.12.23.22283901","DOI":"10.1101\/2022.12.23.22283901"},{"issue":"1","key":"471_CR20","doi-asserted-by":"publisher","first-page":"e10836","DOI":"10.1002\/aet2.10836","volume":"7","author":"M Gottlieb","year":"2023","unstructured":"Gottlieb, M., Bailitz, J., Fix, M., Shappell, E., & Wagner, M. J. (2023). Educator\u2019s blueprint: A how-to guide for developing high-quality multiple-choice questions. AEM Education and Training, 7(1), e10836.","journal-title":"AEM Education and Training"},{"issue":"3","key":"471_CR21","doi-asserted-by":"publisher","first-page":"210","DOI":"10.4103\/ijabmr.IJABMR_30_20","volume":"10","author":"P Gupta","year":"2020","unstructured":"Gupta, P., Meena, P., Khan, A. M., Malhotra, R. K., & Singh, T. (2020). Effect of faculty training on quality of multiple-choice questions. International Journal of Applied and Basic Medical Research, 10(3), 210\u2013214.","journal-title":"International Journal of Applied and Basic Medical Research"},{"issue":"1","key":"471_CR22","doi-asserted-by":"publisher","first-page":"100027","DOI":"10.1016\/j.chbah.2023.100027","volume":"2","author":"R HadiMogavi","year":"2024","unstructured":"HadiMogavi, R., Deng, C., Juho Kim, J., Zhou, P., Kwon, D. Y., Hosny Saleh Metwally, A., Tlili, A., Bassanelli, S., Bucchiarone, A., Gujar, S., Nacke, L. E., & Hui, P. (2024). ChatGPT in education: A blessing or a curse? A qualitative study exploring early adopters\u2019 utilization and perceptions. Computers in Human Behavior: Artificial Humans, 2(1), 100027. https:\/\/doi.org\/10.1016\/j.chbah.2023.100027","journal-title":"Computers in Human Behavior: Artificial Humans"},{"key":"471_CR23","unstructured":"Hall, J. E. (2015). Guyton and Hall Textbook of Medical Physiology. (13th ed.). W B Saunders."},{"issue":"1","key":"471_CR24","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1038\/s41746-024-01157-x","volume":"7","author":"J Haltaufderheide","year":"2024","unstructured":"Haltaufderheide, J., & Ranisch, R. (2024). The ethics of ChatGPT in medicine and healthcare: A systematic review on Large Language Models (LLMs). NPJ Digital Medicine, 7(1), 183.","journal-title":"NPJ Digital Medicine"},{"issue":"6","key":"471_CR25","doi-asserted-by":"publisher","first-page":"943","DOI":"10.1080\/02602938.2020.1828268","volume":"46","author":"MS Ibarra-S\u00e1iz","year":"2021","unstructured":"Ibarra-S\u00e1iz, M. S., Rodr\u00edguez-G\u00f3mez, G., & Boud, D. (2021). The quality of assessment tasks as a determinant of learning. Assessment & Evaluation in Higher Education, 46(6), 943\u2013955. https:\/\/doi.org\/10.1080\/02602938.2020.1828268","journal-title":"Assessment & Evaluation in Higher Education"},{"issue":"8","key":"471_CR26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/0142159X.2023.2294703","volume":"46","author":"IR Indran","year":"2024","unstructured":"Indran, I. R., Paranthaman, P., Gupta, N., & Mustafa, N. (2024). Twelve tips to leverage AI for efficient and effective medical question generation: A guide for educators using Chat GPT. Medical Teacher, 46(8), 1\u20136.","journal-title":"Medical Teacher"},{"issue":"1","key":"471_CR27","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1037\/h0057123","volume":"30","author":"TL Kelley","year":"1939","unstructured":"Kelley, T. L. (1939). The selection of upper and lower groups for the validation of test items. Journal of Educational Psychology, 30(1), 17\u201324. https:\/\/doi.org\/10.1037\/h0057123","journal-title":"Journal of Educational Psychology"},{"key":"471_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s00228-024-03649-x","volume":"80","author":"YS K\u0131yak","year":"2024","unstructured":"K\u0131yak, Y. S., Co\u015fkun, \u00d6., Budako\u011flu, I., & Uluoglu, C. (2024). ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam. European Journal of Clinical Pharmacology, 80, 1\u20137. https:\/\/doi.org\/10.1007\/s00228-024-03649-x","journal-title":"European Journal of Clinical Pharmacology"},{"key":"471_CR29","doi-asserted-by":"publisher","unstructured":"K\u0131yak, Y. S. (2023). A ChatGPT prompt for writing case-based multiple-choice questions. Revista Espa\u00f1ola de Educaci\u00f3n M\u00e9dica, 4(3).\u00a0https:\/\/doi.org\/10.6018\/edumed.587451","DOI":"10.6018\/edumed.587451"},{"key":"471_CR30","doi-asserted-by":"publisher","unstructured":"Klang, E., Portugez, S., Gross, R., Brenner, A., Gilboa, M., Ortal, T., Ron, S., Robinzon, V., Meiri, H., & Segal, G. (2023). Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: A medical education pilot study with GPT-4. BMC Medical Education, 23.\u00a0https:\/\/doi.org\/10.1186\/s12909-023-04752-w","DOI":"10.1186\/s12909-023-04752-w"},{"issue":"7","key":"471_CR31","doi-asserted-by":"publisher","first-page":"5614","DOI":"10.3390\/su15075614","volume":"15","author":"C Kooli","year":"2023","unstructured":"Kooli, C. (2023). Chatbots in Education and Research: A Critical Examination of Ethical Implications and Solutions. Sustainability, 15(7), 5614. https:\/\/doi.org\/10.3390\/su15075614","journal-title":"Sustainability"},{"key":"471_CR32","doi-asserted-by":"publisher","first-page":"JC01","DOI":"10.7860\/JCDR\/2021\/48157.14818","volume":"15","author":"A Kulkarni","year":"2021","unstructured":"Kulkarni, A., & Gowda, M. (2021). Multiple Case Scenarios Based Integrated Teaching among First Year Medical Students- A Cross-sectional Study. Journal of Clinical and Diagnostic Research, 15, JC01\u2013JC05. https:\/\/doi.org\/10.7860\/JCDR\/2021\/48157.14818","journal-title":"Journal of Clinical and Diagnostic Research"},{"key":"471_CR33","doi-asserted-by":"publisher","first-page":"S85","DOI":"10.1016\/j.mjafi.2020.11.007","volume":"77","author":"D Kumar","year":"2021","unstructured":"Kumar, D., Jaipurkar, R., Shekhar, A., Sikri, G., & Srinivas, V. (2021). Item analysis of multiple choice questions: A quality assurance test for an assessment tool. Medical Journal Armed Forces India, 77, S85\u2013S89. https:\/\/doi.org\/10.1016\/j.mjafi.2020.11.007","journal-title":"Medical Journal Armed Forces India"},{"key":"471_CR34","doi-asserted-by":"publisher","DOI":"10.1097\/ACM.0000000000005626","author":"M Laupichler","year":"2023","unstructured":"Laupichler, M., Rother, J., Grunwald Kadow, I., Ahmadi, S., & Raupach, T. (2023). Large Language Models in Medical Education: Comparing ChatGPT- to Human-Generated Exam Questions. Academic Medicine. https:\/\/doi.org\/10.1097\/ACM.0000000000005626","journal-title":"Academic Medicine"},{"key":"471_CR35","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1007\/s40593-018-00172-w","volume":"29","author":"J Leo","year":"2019","unstructured":"Leo, J., Kurdi, G., Matentzoglu, N., Parsia, B., Sattler, U., Forge, S., Donato, G., & Dowling, W. (2019). Ontology-based generation of medical, multi-term MCQs. International Journal of Artificial Intelligence in Education, 29, 145\u2013188.","journal-title":"International Journal of Artificial Intelligence in Education"},{"key":"471_CR36","doi-asserted-by":"publisher","unstructured":"Liu, H., Ning, R., Teng, Z., Liu, J., Zhou, Q., & Zhang, Y. (2023). Evaluating the logical reasoning ability of ChatGPT and GPT-4.\u00a0https:\/\/doi.org\/10.48550\/arXiv.2304.03439","DOI":"10.48550\/arXiv.2304.03439"},{"key":"471_CR37","doi-asserted-by":"publisher","unstructured":"Lonergan, R. M., Curry, J., Dhas, K., & Simmons, B. I. (2023). Stratified evaluation of GPT\u2019s question answering in surgery reveals artificial intelligence (AI) knowledge gaps. Cureus, 15(11).\u00a0https:\/\/doi.org\/10.7759\/cureus.48788","DOI":"10.7759\/cureus.48788"},{"issue":"1","key":"471_CR38","doi-asserted-by":"publisher","first-page":"120","DOI":"10.5334\/pme.47","volume":"2","author":"A MacLeod","year":"2023","unstructured":"MacLeod, A., Luong, V., Cameron, P., Burm, S., Field, S., Kits, O., Miller, S., & Stewart, W. A. (2023). Case-informed learning in medical education: A call for ontological fidelity. Perspectives on Medical Education, 2(1), 120.","journal-title":"Perspectives on Medical Education"},{"issue":"7","key":"471_CR39","doi-asserted-by":"publisher","first-page":"673","DOI":"10.1080\/0142159X.2023.2208731","volume":"45","author":"K Masters","year":"2023","unstructured":"Masters, K. (2023). Medical Teacher\u2019s first ChatGPT\u2019s referencing hallucinations: Lessons for editors, reviewers, and teachers. Medical Teacher, 45(7), 673\u2013675.","journal-title":"Medical Teacher"},{"key":"471_CR40","doi-asserted-by":"publisher","unstructured":"Moore, S., Costello, E., Nguyen, H. A., & Stamper, J. (2024). An automatic question usability evaluation toolkit. International Conference on Artificial Intelligence in Education, 31\u201346.\u00a0https:\/\/doi.org\/10.48550\/arXiv.2405.20529","DOI":"10.48550\/arXiv.2405.20529"},{"issue":"1","key":"471_CR41","doi-asserted-by":"publisher","first-page":"100099","DOI":"10.1016\/j.acpath.2023.100099","volume":"11","author":"A Ngo","year":"2023","unstructured":"Ngo, A., Gupta, S., Perrine, O., Reddy, R., Ershadi, S., & Remick, D. (2023). ChatGPT 3.5 fails to write appropriate multiple choice practice exam questions. Academic Pathology, 11(1), 100099.","journal-title":"Academic Pathology"},{"issue":"3","key":"471_CR42","doi-asserted-by":"publisher","first-page":"206","DOI":"10.3109\/0142159X.2011.551559","volume":"33","author":"J Norcini","year":"2011","unstructured":"Norcini, J., Anderson, B., Bollela, V., Burch, V., Costa, M. J., Duvivier, R., Galbraith, R., Hays, R., Kent, A., Perrott, V., & Roberts, T. (2011). Criteria for good assessment: Consensus statement and recommendations from the Ottawa 2010 Conference. Medical Teacher, 33(3), 206\u2013214. https:\/\/doi.org\/10.3109\/0142159X.2011.551559","journal-title":"Medical Teacher"},{"key":"471_CR43","doi-asserted-by":"publisher","first-page":"103537","DOI":"10.1016\/j.nepr.2022.103537","volume":"66","author":"S O\u2019Connor","year":"2022","unstructured":"O\u2019Connor, S. (2022). Open artificial intelligence platforms in nursing education: Tools for academic progress or abuse? Nurse Education in Practice, 66, 103537.","journal-title":"Nurse Education in Practice"},{"key":"471_CR44","unstructured":"OpenAI. (2023). ChatGPT (Mar 14 version) [Large language model]. https:\/\/chat.openai.com\/chat. Accessed May 2023."},{"key":"471_CR45","volume-title":"Constructing written test questions for the basic and clinical sciences","author":"MA Paniagua","year":"2016","unstructured":"Paniagua, M. A., & Swygert, K. A. (2016). Constructing written test questions for the basic and clinical sciences. National Board of Medical Examiners."},{"key":"471_CR46","doi-asserted-by":"publisher","unstructured":"Reyna, J. (2023). Writing Effective Multiple-Choice Questions in Medical Education. INTED2023 Proceedings, 1\u201310.\u00a0https:\/\/doi.org\/10.21125\/inted.2023","DOI":"10.21125\/inted.2023"},{"key":"471_CR47","doi-asserted-by":"publisher","first-page":"118258","DOI":"10.1016\/j.eswa.2022.118258","volume":"208","author":"R Rodriguez-Torrealba","year":"2022","unstructured":"Rodriguez-Torrealba, R., Garcia-Lopez, E., & Garcia-Cabot, A. (2022). End-to-end generation of multiple-choice questions using text-to-text transfer transformer models. Expert Systems with Applications, 208, 118258.","journal-title":"Expert Systems with Applications"},{"issue":"9","key":"471_CR48","first-page":"323","volume":"57","author":"S Ross","year":"2011","unstructured":"Ross, S., Poth, C. N., Donoff, M., Humphries, P., Steiner, I., Schipper, S., Janke, F., & Nichols, D. (2011). Competency-based achievement system: Using formative feedback to teach and assess family medicine residents\u2019 skills. Canadian Family Physician, 57(9), 323\u2013330.","journal-title":"Canadian Family Physician"},{"issue":"10","key":"471_CR49","doi-asserted-by":"publisher","first-page":"469","DOI":"10.3390\/a17100469","volume":"17","author":"M Saad","year":"2024","unstructured":"Saad, M., Almasri, W., Hye, T., Roni, M., & Mohiyeddini, C. (2024). Analysis of ChatGPT-3.5\u2019s Potential in Generating NBME-Standard Pharmacology Questions: What Can Be Improved? Algorithms, 17(10), 469.","journal-title":"Algorithms"},{"key":"471_CR50","doi-asserted-by":"publisher","unstructured":"Sallam, M. (2023). The utility of ChatGPT as an example of large language models in healthcare education, research and practice: Systematic review on the future perspectives and potential limitations. MedRxiv, 2022\u20132023.\u00a0https:\/\/doi.org\/10.1101\/2023.02.19.23286155","DOI":"10.1101\/2023.02.19.23286155"},{"issue":"1","key":"471_CR51","doi-asserted-by":"publisher","first-page":"9330","DOI":"10.1038\/s41598-024-58760-x","volume":"14","author":"A Shieh","year":"2024","unstructured":"Shieh, A., Tran, B., He, G., Kumar, M., Freed, J. A., & Majety, P. (2024). Assessing ChatGPT 4.0\u2019s test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports. Scientific Reports, 14(1), 9330.","journal-title":"Scientific Reports"},{"key":"471_CR52","doi-asserted-by":"publisher","first-page":"937","DOI":"10.47175\/rielsj.v4i4.835","volume":"4","author":"M Sihite","year":"2023","unstructured":"Sihite, M., Meisuri, M., & Sibarani, B. (2023). Examining the Validity and Reliability of ChatGPT 3.5-Generated Reading Comprehension Questions for Academic Texts. Randwick International of Education and Linguistics Science Journal, 4, 937\u2013944. https:\/\/doi.org\/10.47175\/rielsj.v4i4.835","journal-title":"Randwick International of Education and Linguistics Science Journal"},{"key":"471_CR53","doi-asserted-by":"publisher","unstructured":"Singh, T., Gupta, P., spsampsps Singh, D. (2013). Principles of Medical Education. T Singh, P Gupta, D Singh; Jaypee brothers N Delhi 2013. https:\/\/doi.org\/10.13140\/2.1.4503.6487","DOI":"10.13140\/2.1.4503.6487"},{"key":"471_CR54","first-page":"119","volume":"1","author":"A Supe","year":"2018","unstructured":"Supe, A., Sheshadri, G. K., Singh, P., Sajith, K. R., Chalam, P. V., & Maulik, K. S. (2018). Medical Council of India, Competency based Undergraduate curriculum for the Indian Medical Graduate. New Delhi: Medical Council of India, 1, 119\u2013135.","journal-title":"New Delhi: Medical Council of India"},{"issue":"2","key":"471_CR55","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1016\/j.jtumed.2013.12.002","volume":"9","author":"F Taib","year":"2014","unstructured":"Taib, F., & Yusoff, M. S. B. (2014). Difficulty index, discrimination index, sensitivity and specificity of long case and multiple choice questions to predict medical students\u2019 examination performance. Journal of Taibah University Medical Sciences, 9(2), 110\u2013114. https:\/\/doi.org\/10.1016\/j.jtumed.2013.12.002","journal-title":"Journal of Taibah University Medical Sciences"},{"key":"471_CR56","doi-asserted-by":"publisher","unstructured":"Temsah, O., Khan, S. A., Chaiah, Y., Senjab, A., Alhasan, K., Jamal, A., Aljamaan, F., Malki, K. H., Halwani, R., & Al-Tawfiq, J. A. (2023). Overview of early ChatGPT\u2019s presence in medical literature: Insights from a hybrid literature review by ChatGPT and human experts. Cureus, 15(4).\u00a0https:\/\/doi.org\/10.7759\/cureus.37281","DOI":"10.7759\/cureus.37281"},{"key":"471_CR57","doi-asserted-by":"publisher","first-page":"1237432","DOI":"10.3389\/fmed.2023.1237432","volume":"10","author":"W Tong","year":"2023","unstructured":"Tong, W., Guan, Y., Chen, J., Huang, X., Zhong, Y., Zhang, C., & Zhang, H. (2023). Artificial intelligence in global health equity: An evaluation and discussion on the application of ChatGPT, in the Chinese National Medical Licensing Examination. Frontiers in Medicine, 10, 1237432.","journal-title":"Frontiers in Medicine"},{"issue":"1","key":"471_CR58","doi-asserted-by":"publisher","first-page":"215824402210821","DOI":"10.1177\/21582440221082130","volume":"12","author":"L Yunjiu","year":"2022","unstructured":"Yunjiu, L., Wei, W., & Zheng, Y. (2022). Artificial intelligence-generated and human expert-designed vocabulary tests: A comparative study. SAGE Open, 12(1), 21582440221082130.","journal-title":"SAGE Open"},{"issue":"7","key":"471_CR59","first-page":"1","volume":"41","author":"Y Zhu","year":"2023","unstructured":"Zhu, Y., & Yang, F. (2023). Chatgpt\/aigc and educational innovation: Opportunities, challenges, and the future. Journal of East China Normal University (Educational Sciences), 41(7), 1.","journal-title":"Journal of East China Normal University (Educational Sciences)"},{"key":"471_CR60","doi-asserted-by":"publisher","first-page":"1224","DOI":"10.1080\/0142159X.2023.2249239","volume":"45","author":"M Zuckerman","year":"2023","unstructured":"Zuckerman, M., Flood, R., Tan, R., Kelp, N., Ecker, D., Menke, J., & Lockspeiser, T. (2023). ChatGPT for assessment writing. Medical Teacher, 45, 1224\u20131227. https:\/\/doi.org\/10.1080\/0142159X.2023.2249239","journal-title":"Medical Teacher"}],"container-title":["International Journal of Artificial Intelligence in Education"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00471-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40593-025-00471-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40593-025-00471-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T18:12:40Z","timestamp":1772647960000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40593-025-00471-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,7]]},"references-count":60,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["471"],"URL":"https:\/\/doi.org\/10.1007\/s40593-025-00471-z","relation":{},"ISSN":["1560-4292","1560-4306"],"issn-type":[{"value":"1560-4292","type":"print"},{"value":"1560-4306","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,7]]},"assertion":[{"value":"23 March 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 April 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}]}}