{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,23]],"date-time":"2026-07-23T02:52:56Z","timestamp":1784775176997,"version":"3.55.0"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2025,1,3]],"date-time":"2025-01-03T00:00:00Z","timestamp":1735862400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,3]],"date-time":"2025-01-03T00:00:00Z","timestamp":1735862400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"IC4LANG","award":["(KK-2023-00094),"],"award-info":[{"award-number":["(KK-2023-00094),"]}]},{"name":"Deep Knowledge","award":["PID2021-127777OB-C21"],"award-info":[{"award-number":["PID2021-127777OB-C21"]}]},{"name":"MCIN\/AEI","award":["10.13039\/501100011033"],"award-info":[{"award-number":["10.13039\/501100011033"]}]},{"name":"The University of the Basque Country","award":["PIF20\/154 UPV\/EHU 2020"],"award-info":[{"award-number":["PIF20\/154 UPV\/EHU 2020"]}]},{"name":"Deep Knowledge","award":["PID2021-127777OB-C21"],"award-info":[{"award-number":["PID2021-127777OB-C21"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Educ Inf Technol"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Recent advancements in large language models (LLMs) have shown potential in enhancing educational practices, particularly in technology-assisted learning environments. This study critically evaluates the reasoning capabilities of LLMs, such as ChatGPT, within the context of chemistry education. We designed targeted adversarial prompts that challenge the models to solve complex chemistry problems and assessed their performance. By pushing the boundaries of LLM reasoning, we aim to identify their limitations and strengths in handling queries within the chemistry domain. Our findings expose inherent weaknesses in current AI systems, emphasizing the necessity of cautious AI deployment in teaching methodologies. We argue for a balanced approach, leveraging the benefits of LLMs while mitigating their limitations, to facilitate their seamless adoption in education.<\/jats:p>","DOI":"10.1007\/s10639-024-13295-6","type":"journal-article","created":{"date-parts":[[2025,1,3]],"date-time":"2025-01-03T09:38:28Z","timestamp":1735897108000},"page":"11463-11482","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Evaluating and challenging the reasoning capabilities of generative artificial intelligence for technology-assisted chemistry education"],"prefix":"10.1007","volume":"30","author":[{"given":"Suna-\u015eeyma","family":"U\u00e7ar","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7479-4718","authenticated-orcid":false,"given":"Inigo","family":"Lopez-Gazpio","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Josu","family":"Lopez-Gazpio","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,1,3]]},"reference":[{"key":"13295_CR1","doi-asserted-by":"crossref","unstructured":"Arora, D., Singh, H. G., et al. (2023). Have llms advanced enough? a challenging problem solving benchmark for large language models. arXiv preprint arXiv:2305.15074","DOI":"10.18653\/v1\/2023.emnlp-main.468"},{"issue":"5","key":"13295_CR2","doi-asserted-by":"publisher","first-page":"1905","DOI":"10.1021\/acs.jchemed.3c00027","volume":"100","author":"TM Clark","year":"2023","unstructured":"Clark, T. M. (2023). Investigating the use of an artificial intelligence chatbot with general chemistry exam questions. Journal of Chemical Education, 100(5), 1905\u20131916.","journal-title":"Journal of Chemical Education"},{"issue":"16","key":"13295_CR3","doi-asserted-by":"publisher","first-page":"425","DOI":"10.1177\/002205741608401602","volume":"84","author":"J Dewey","year":"1916","unstructured":"Dewey, J. (1916). Nationalizing education. Journal of Education, 84(16), 425\u2013428.","journal-title":"Journal of Education"},{"key":"13295_CR4","doi-asserted-by":"crossref","unstructured":"de Wynter, A., Wang, X., Sokolov, A., Gu, Q., & Chen, S.-Q. (2023). An evaluation on large language model outputs: Discourse and memorization. arXiv preprint arXiv:2304.08637","DOI":"10.1016\/j.nlp.2023.100024"},{"issue":"4","key":"13295_CR5","doi-asserted-by":"publisher","first-page":"1413","DOI":"10.1021\/acs.jchemed.3c00063","volume":"100","author":"ME Emenike","year":"2023","unstructured":"Emenike, M. E., & Emenike, B. U. (2023). Was this title generated by chatgpt? considerations for artificial intelligence text-generation software programs for chemists and chemistry educators. Journal of Chemical Education, 100(4), 1413\u2013141.","journal-title":"Journal of Chemical Education"},{"issue":"3","key":"13295_CR6","doi-asserted-by":"publisher","first-page":"4","DOI":"10.3102\/0013189X018003004","volume":"18","author":"RH Ennis","year":"1989","unstructured":"Ennis, R. H. (1989). Critical thinking and subject specificity: Clarification and needed research. Educational researcher, 18(3), 4\u201310.","journal-title":"Educational researcher"},{"issue":"8","key":"13295_CR7","doi-asserted-by":"publisher","first-page":"2972","DOI":"10.1021\/acs.jchemed.3c00481","volume":"100","author":"B Exintaris","year":"2023","unstructured":"Exintaris, B., Karunaratne, N., & Yuriev, E. (2023). Metacognition and critical thinking: using chatgpt-generated responses as prompts for critique in a problem-solving workshop (smartchemper). Journal of Chemical Education, 100(8), 2972\u20132980.","journal-title":"Journal of Chemical Education"},{"issue":"4","key":"13295_CR8","doi-asserted-by":"publisher","first-page":"1672","DOI":"10.1021\/acs.jchemed.3c00087","volume":"100","author":"S Fergus","year":"2023","unstructured":"Fergus, S., Botha, M., & Ostovar, M. (2023). Evaluating academic answers generated using chatgpt. Journal of Chemical Education, 100(4), 1672\u20131675.","journal-title":"Journal of Chemical Education"},{"key":"13295_CR9","unstructured":"Guo, T., Guo, K., Nan, B., Liang, Z., Guo, Z., Chawla, N.V., . . . Zhang, X. (2023). What can large language models do in chemistry? a comprehensive benchmark on eight tasks. arXiv preprint arXiv:2305.18365"},{"issue":"12","key":"13295_CR10","doi-asserted-by":"publisher","first-page":"4876","DOI":"10.1021\/acs.jchemed.3c00505","volume":"100","author":"Y Guo","year":"2023","unstructured":"Guo, Y., & Lee, D. (2023). Leveraging chatgpt for enhancing critical thinking skills. Journal of Chemical Education, 100(12), 4876\u20134883.","journal-title":"Journal of Chemical Education"},{"issue":"4","key":"13295_CR11","doi-asserted-by":"publisher","first-page":"1434","DOI":"10.1021\/acs.jchemed.3c00006","volume":"100","author":"T Humphry","year":"2023","unstructured":"Humphry, T., & Fuller, A. L. (2023). Potential chatgpt use in undergraduate chemistry laboratories. Journal of Chemical Education, 100(4), 1434\u20131436.","journal-title":"Journal of Chemical Education"},{"key":"13295_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00579-z","volume":"14","author":"Y Kwon","year":"2022","unstructured":"Kwon, Y., Lee, D., Choi, Y.-S., & Kang, S. (2022). Uncertainty-aware prediction of chemical reaction yields with graph neural networks. Journal of Cheminformatics, 14, 1\u201310.","journal-title":"Journal of Cheminformatics"},{"key":"13295_CR13","first-page":"15","volume":"72","author":"I Lopez-Gazpio","year":"2024","unstructured":"Lopez-Gazpio, I. (2024). Revisiting challenges and hazards in large language model evaluation. Procesamiento del Lenguaje Natural, 72, 15\u201330.","journal-title":"Procesamiento del Lenguaje Natural"},{"key":"13295_CR14","unstructured":"Lu, P., Mishra, S., Xia, T., Qiu, L., Chang, K.-W., Zhu, S.-C., . . . Kalyan, A. (2022). Learn to explain: Multimodal reasoning via thought chains for science question answering. The 36th conference on neural information processing systems (neurips)."},{"key":"13295_CR15","unstructured":"Lu, P., Peng, B., Cheng, H., Galley, M., Chang, K.-W., Wu, Y. N., . . . Gao, J. (2024). Chameleon: Plug-and-play compositional reasoning with large language models. Advances in Neural Information Processing Systems, 36."},{"key":"13295_CR16","volume-title":"Critical thinking and education","author":"JE McPeck","year":"1981","unstructured":"McPeck, J. E. (1981). Critical thinking and education. Routledge."},{"issue":"1","key":"13295_CR17","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1007\/s40692-021-00199-4","volume":"9","author":"NS Raj","year":"2022","unstructured":"Raj, N. S., & Renumol, V. (2022). A systematic literature review on adaptive content recommenders in personalized learning environments from 2015 to 2020. Journal of Computers in Education, 9(1), 113\u2013148.","journal-title":"Journal of Computers in Education"},{"key":"13295_CR18","doi-asserted-by":"crossref","unstructured":"Rudolph, J., Tan, S., & Tan, S. (2023). Chatgpt: Bullshit spewer or the end of traditional assessments in higher education? Journal of Applied Learning and Teaching, 6(1)","DOI":"10.37074\/jalt.2023.6.1.9"},{"key":"13295_CR19","unstructured":"Shanahan, M. (2022). Talking about large language models. arXiv preprint arXiv:2212.03551"},{"key":"13295_CR20","doi-asserted-by":"crossref","unstructured":"Sun, L., Han, Y., Zhao, Z., Ma, D., Shen, Z., Chen, B., . . . Yu, K. (2023). Scieval: A multi-level large language model evaluation benchmark for scientific research. arXiv preprint arXiv:2308.13149","DOI":"10.1609\/aaai.v38i17.29872"},{"issue":"8","key":"13295_CR21","doi-asserted-by":"publisher","first-page":"2821","DOI":"10.1021\/acs.jchemed.3c00472","volume":"100","author":"V Talanquer","year":"2023","unstructured":"Talanquer, V. (2023). Interview with the chatbot: How does it reason? Journal of Chemical Education, 100(8), 2821\u20132824.","journal-title":"Journal of Chemical Education"},{"key":"13295_CR22","doi-asserted-by":"crossref","unstructured":"Tan, C., Wei, J., Gao, Z., Sun, L., Li, S., Yang, X., & Li, S.Z. (2023). Boosting the power of small multimodal reasoning models to match larger models with selfconsistency training. arXiv preprint arXiv:2311.14109","DOI":"10.1007\/978-3-031-73661-2_17"},{"issue":"8","key":"13295_CR23","doi-asserted-by":"publisher","first-page":"3098","DOI":"10.1021\/acs.jchemed.3c00361","volume":"100","author":"J Tyson","year":"2023","unstructured":"Tyson, J. (2023). Shortcomings of chatgpt. Journal of Chemical Education, 100(8), 3098\u20133101.","journal-title":"Journal of Chemical Education"},{"issue":"1","key":"13295_CR24","doi-asserted-by":"publisher","first-page":"41","DOI":"10.3200\/CTCH.53.1.41-48","volume":"53","author":"T van Gelder","year":"2005","unstructured":"van Gelder, T. (2005). Teaching critical thinking: Some lessons from cognitive science. College Teaching, 53(1), 41\u201348. https:\/\/doi.org\/10.3200\/CTCH.53.1.41-48","journal-title":"College Teaching"},{"key":"13295_CR25","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., . . . Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30"},{"key":"13295_CR26","unstructured":"Wei, J., Tan, C., Gao, Z., Sun, L., Li, S., Yu, B., . . . Li, S. Z. (2023). Enhancing humanlike multi-modal reasoning: A new challenging dataset and comprehensive framework. arXiv preprint arXiv:2307.12626"},{"key":"13295_CR27","doi-asserted-by":"crossref","unstructured":"Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., & Pande, V. (2018). Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2), 513\u2013530.","DOI":"10.1039\/C7SC02664A"},{"key":"13295_CR28","unstructured":"Zheng, G., Yang, B., Tang, J., Zhou, H.-Y., & Yang, S. (2023). Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models. arXiv preprint arXiv:2310.16436"}],"container-title":["Education and Information Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10639-024-13295-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10639-024-13295-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10639-024-13295-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T06:42:30Z","timestamp":1748587350000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10639-024-13295-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,3]]},"references-count":28,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["13295"],"URL":"https:\/\/doi.org\/10.1007\/s10639-024-13295-6","relation":{},"ISSN":["1360-2357","1573-7608"],"issn-type":[{"value":"1360-2357","type":"print"},{"value":"1573-7608","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,3]]},"assertion":[{"value":"16 July 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 December 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed Consent"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Statement Regarding Research Involving Human Participants and\/or Animals"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to Participate"}},{"value":"Not applicable.","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to Publish"}},{"value":"The Authors state that there is no conflict of interest.","order":7,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}]}}