{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T16:09:46Z","timestamp":1765814986726,"version":"3.48.0"},"reference-count":31,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T00:00:00Z","timestamp":1765756800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Educational chatbots are increasingly deployed to scaffold student learning, yet educators lack scalable ways to assess the cognitive depth of these dialogues in situ. Bloom\u2019s taxonomy provides a principled lens for characterizing reasoning, but manual tagging of conversational turns is costly and difficult to scale for learning analytics. We present a reproducible high-confidence pseudo-labeling pipeline for multi-label Bloom classification of Socratic student\u2013chatbot exchanges. The dataset comprises 6716 utterances collected from conversations between a Socratic chatbot and 34 undergraduate statistics students at Nanyang Technological University. From three chronologically selected workbooks with expert Bloom annotations, we trained and compared two labeling tracks: (i) a calibrated classical approach using SentenceTransformer (all-MiniLM-L6-v2) embeddings with one-vs-rest Logistic Regression, Linear SVM, XGBoost, and MLP, followed by per-class precision\u2013recall threshold tuning; and (ii) a lightweight LLM track using GPT-4o-mini after supervised fine-tuning. Class-specific thresholds tuned on 5-fold cross-validation were then applied in a single pass to assign high-confidence pseudo-labels to the remaining unlabeled exchanges, avoiding feedback-loop confirmation bias. Fine-tuned GPT-4o-mini achieved the highest prevalence-weighted performance (micro-F1 =0.814), whereas calibrated classical models yielded stronger balance across Bloom levels (best macro-F1 =0.630 with Linear SVM; best classical micro-F1 =0.759 with Logistic Regression). Both model families reflect the corpus skew toward lower-order cognition, with LLMs excelling on common patterns and linear models better preserving rarer higher-order labels, while results should be interpreted as a proof-of-concept given limited gold labeling, the approach substantially reduces annotation burden and provides a practical pathway for Bloom-aware learning analytics and future real-time adaptive chatbot support.<\/jats:p>","DOI":"10.3390\/computers14120555","type":"journal-article","created":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T15:52:59Z","timestamp":1765813979000},"page":"555","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Learning Analytics with Scalable Bloom\u2019s Taxonomy Labeling of Socratic Chatbot Dialogues"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9379-8079","authenticated-orcid":false,"given":"Kok Wai","family":"Lee","sequence":"first","affiliation":[{"name":"Science, Mathematics and Technology, Singapore University of Technology and Design, Singapore 487372, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1637-1610","authenticated-orcid":false,"given":"Yee Sin","family":"Ang","sequence":"additional","affiliation":[{"name":"Science, Mathematics and Technology, Singapore University of Technology and Design, Singapore 487372, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5619-2051","authenticated-orcid":false,"given":"Joel Weijia","family":"Lai","sequence":"additional","affiliation":[{"name":"Institute for Pedagogical Innovation, Research and Excellence, Nanyang Technological University, Singapore 639798, Singapore"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,15]]},"reference":[{"key":"ref_1","unstructured":"Bloom, B.S., Engelhart, M.D., Furst, E.J., Hill, W.H., and Krathwohl, D.R. (1956). Taxonomy of Educational Objectives: The Classification of Educational Goals. Handbook I: Cognitive Domain, Longmans, Green."},{"key":"ref_2","unstructured":"Anderson, L., and Krathwohl, D. (2001). A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom\u2019s Taxonomy of Educational Objectives, Longman."},{"key":"ref_3","unstructured":"Society for Learning Analytics Research (2025, September 23). What is Learning Analytics?. Available online: https:\/\/www.solaresearch.org\/about\/what-is-learning-analytics\/."},{"key":"ref_4","first-page":"2781","article-title":"Chatbots in higher education: A systematic review","volume":"33","author":"Chen","year":"2024","journal-title":"Interact. Learn. Environ."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1186\/s41239-023-00426-1","article-title":"Role of AI chatbots in education: Systematic literature review","volume":"20","author":"Labadze","year":"2023","journal-title":"Int. J. Educ. Technol. High. Educ."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Favero, L., P\u00e9rez-Ortiz, J.A., K\u00e4ser, T., and Oliver, N. (2025). Enhancing Critical Thinking in Education by Means of a Socratic Chatbot. AI in Education and Educational Research, Springer Nature.","DOI":"10.1007\/978-3-031-93409-4_2"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Fakour, H., and Imani, M. (2025). Socratic wisdom in the age of AI: A comparative study of ChatGPT and human tutors in enhancing critical thinking skills. Front. Educ., 10.","DOI":"10.3389\/feduc.2025.1528603"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"32","DOI":"10.18608\/jla.2025.8549","article-title":"Leveraging Process-Action Epistemic Network Analysis to Illuminate Student Self-Regulated Learning with a Socratic Chatbot","volume":"12","author":"Lai","year":"2025","journal-title":"J. Learn. Anal."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"105323","DOI":"10.1016\/j.compedu.2025.105323","article-title":"Chatbots in education: A systematic review of objectives, underlying technology and theory, evaluation criteria, and impacts","volume":"234","author":"Debets","year":"2025","journal-title":"Comput. Educ."},{"key":"ref_10","unstructured":"Li, Y., Rakovic, M., Poh, B.X., Gasevic, D., and Chen, G. (2022, January 24\u201327). Automatic Classification of Learning Objectives Based on Bloom\u2019s Taxonomy. Proceedings of the 15th International Conference on Educational Data Mining, Durham, UK."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"100404","DOI":"10.1016\/j.caeai.2025.100404","article-title":"Leveraging generative AI for course learning outcome categorization using Bloom\u2019s taxonomy","volume":"8","author":"Almatrafi","year":"2025","journal-title":"Comput. Educ. Artif. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1016\/j.sbspro.2012.09.278","article-title":"Automated Analysis of Exam Questions According to Bloom\u2019s Taxonomy","volume":"59","author":"Omar","year":"2012","journal-title":"Procedia-Soc. Behav. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Huang, J., Zhang, Z., Qiu, J., Peng, L., Liu, D., Han, P., and Luo, K. (2021, January 15\u201317). Automatic Classroom Question Classification Based on Bloom\u2019s Taxonomy. Proceedings of the 13th International Conference on Education Technology and Computers, ICETC 2021, London, UK.","DOI":"10.1145\/3498765.3498771"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kazemi Vanhari, F., Anand, C., and Welch, C. (2025, January 20\u201321). Analyzing Interview Questions via Bloom\u2019s Taxonomy to Enhance the Design Thinking Process. Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), Vienna, Austria.","DOI":"10.18653\/v1\/2025.bea-1.42"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/j.iheduc.2009.08.003","article-title":"Supporting active cognitive processing in collaborative groups: The potential of Bloom\u2019s taxonomy as a labeling tool","volume":"12","author":"Valcke","year":"2009","journal-title":"Internet High. Educ."},{"key":"ref_16","unstructured":"Li, X., Liu, J., Wang, X., and Chen, S. (2024). A Survey on Incomplete Multi-label Learning: Recent Advances and Future Trends. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhang, H., Zhao, X., and Wang, D. (2022, January 10\u201314). Semi-supervised Learning for Multi-label Video Action Detection. Proceedings of the 30th ACM International Conference on Multimedia, MM\u201922, Lisbon, Portugal.","DOI":"10.1145\/3503161.3547980"},{"key":"ref_18","unstructured":"Bird, S., Klein, E., and Loper, E. (2009, January 9). Natural language processing with Python. Proceedings of the 18th International Conference on Computational Linguistics: Demonstrations, Singapore."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1145\/219717.219748","article-title":"WordNet: A lexical database for English","volume":"38","author":"Miller","year":"1995","journal-title":"Commun. ACM"},{"key":"ref_20","unstructured":"Jurafsky, D., and Martin, J.H. (2009). Speech and Language Processing, Prentice Hall."},{"key":"ref_21","unstructured":"Manning, C.D., and Sch\u00fctze, H. (1999). Foundations of Statistical Natural Language Processing, MIT Press."},{"key":"ref_22","unstructured":"Reimers, N., and Gurevych, I. (2025, May 01). all-MiniLM-L6-v2. Available online: https:\/\/huggingface.co\/sentence-transformers\/all-MiniLM-L6-v2."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2019, January 3\u20137). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong, China.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_24","first-page":"5776","article-title":"MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers","volume":"33","author":"Wang","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1111\/j.2517-6161.1958.tb00292.x","article-title":"The regression analysis of binary sequences","volume":"20","author":"Cox","year":"1958","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1023\/A:1009982220290","article-title":"An evaluation of statistical approaches to text categorization","volume":"1","author":"Yang","year":"1999","journal-title":"Inf. Retr."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On Information and Sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/12\/555\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T16:06:39Z","timestamp":1765814799000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/12\/555"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,15]]},"references-count":31,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["computers14120555"],"URL":"https:\/\/doi.org\/10.3390\/computers14120555","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,15]]}}}