{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T14:02:01Z","timestamp":1774965721051,"version":"3.50.1"},"reference-count":27,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,3,18]],"date-time":"2025-03-18T00:00:00Z","timestamp":1742256000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Deputyship for Research and Innovation, Ministry of Education in Saudi Arabia","award":["IFPRC-054-612-2020"],"award-info":[{"award-number":["IFPRC-054-612-2020"]}]},{"name":"King Abdulaziz University","award":["IFPRC-054-612-2020"],"award-info":[{"award-number":["IFPRC-054-612-2020"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Improving the recognition of online learning engagement is a critical issue in educational information technology, due to the complexities of student behavior and varying assessment standards. Additionally, the scarcity of publicly available datasets for engagement recognition exacerbates this challenge. The majority of existing methods for detecting student engagement necessitate significant amounts of annotated data to capture variations in behaviors and interaction patterns. To address these limitations, we investigate few-shot learning (FSL) techniques to reduce the dependency on extensive training data. Transformer-based models have shown comprehensive results for video-based facial recognition tasks, thus paving new ground for understanding complicated patterns. In this research, we propose an innovative FSL model that employs a prototypical network with the vision transformer (ViT) model pre-trained on a face recognition dataset (e.g., MS1MV2) for spatial feature extraction, followed by an LSTM layer for temporal feature extraction. This approach effectively addresses the challenges of limited labeled data in engagement recognition. Our proposed approach achieves state-of-the-art performance on the EngageNet dataset, demonstrating its efficacy and potential in advancing engagement recognition research.<\/jats:p>","DOI":"10.3390\/computers14030109","type":"journal-article","created":{"date-parts":[[2025,3,18]],"date-time":"2025-03-18T04:34:43Z","timestamp":1742272483000},"page":"109","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Transformer-Based Student Engagement Recognition Using Few-Shot Learning"],"prefix":"10.3390","volume":"14","author":[{"given":"Wejdan","family":"Alarefah","sequence":"first","affiliation":[{"name":"Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University (KAU), Jeddah 21589, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1086-6599","authenticated-orcid":false,"given":"Salma Kammoun","family":"Jarraya","sequence":"additional","affiliation":[{"name":"Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University (KAU), Jeddah 21589, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9578-8475","authenticated-orcid":false,"given":"Nihal","family":"Abuzinadah","sequence":"additional","affiliation":[{"name":"Computer Science Department, Faculty of Computing and Information Technology, King Abdulaziz University (KAU), Jeddah 21589, Saudi Arabia"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hu, M., and Li, H. (2017, January 27\u201329). Student engagement in online learning: A review. Proceedings of the 2017 International Symposium on Educational Technology, ISET 2017, Hong Kong, China.","DOI":"10.1109\/ISET.2017.17"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.jksuci.2018.12.008","article-title":"A new emotion\u2013based affective model to detect student\u2019s engagement","volume":"33","author":"Altuwairqi","year":"2021","journal-title":"J. King Saud Univ.-Comput. Inf. Sci."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1177\/0735633119825575","article-title":"Data-driven Online Learning Engagement Detection via Facial Expression and Mouse Behavior Recognition Technology","volume":"58","author":"Zhang","year":"2020","journal-title":"J. Educ. Comput. Res."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Mohamad Nezami, O., Dras, M., Hamey, L., Richards, D., Wan, S., and Paris, C. (2020). Automatic Recognition of Student Engagement Using Deep Learning and Facial Expression. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer.","DOI":"10.1007\/978-3-030-46133-1_17"},{"key":"ref_5","first-page":"2655","article-title":"Engagement detection based on analyzing micro body gestures using 3D CNN","volume":"70","author":"Khenkar","year":"2022","journal-title":"Comput. Mater. Contin."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kaur, A., Mustafa, A., Mehta, L., and Dhall, A. (2018, January 10\u201313). Prediction and Localization of Student Engagement in the Wild. Proceedings of the 2018 Digital Image Computing: Techniques and Applications (DICTA), Canberra, ACT, Australia. Available online: http:\/\/arxiv.org\/abs\/1804.00858.","DOI":"10.1109\/DICTA.2018.8615851"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Murshed, M., Dewan, M.A.A., Lin, F., and Wen, D. (2019, January 5\u20138). Engagement Detection in e-Learning Environments using Convolutional Neural Networks. Proceedings of the 2019 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC\/PiCom\/CBDCom\/CyberSciTech), Fukuoka, Japan.","DOI":"10.1109\/DASC\/PiCom\/CBDCom\/CyberSciTech.2019.00028"},{"key":"ref_8","first-page":"63","article-title":"Generalizing from a Few Examples: A Survey on Few-shot Learning","volume":"53","author":"Wang","year":"2020","journal-title":"ACM Comput. Surv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Singh, M., Hoque, X., Zeng, D., Wang, Y., Ikeda, K., and Dhall, A. (2023, January 9\u201313). Do I Have Your Attention: A Large Scale Engagement Prediction Dataset and Baselines. Proceedings of the 25th International Conference on Multimodal Interaction, Paris, France.","DOI":"10.1145\/3577190.3614164"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, Y., Zhang, H., Zhang, W., Lu, G., Tian, Q., and Ling, N. (2022). Few-Shot Image Classification: Current Status and Research Trends. Electronics, 11.","DOI":"10.3390\/electronics11111752"},{"key":"ref_11","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Dan, J., Liu, Y., Xie, H., Deng, J., Xie, H., Xie, X., and Sun, B. (2023, January 2\u20136). TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01887"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mandia, S., Singh, K., and Mitharwal, R. (2022, January 5\u20137). Vision Transformer for Automatic Student Engagement Estimation. Proceedings of the 2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS), Genova, Italy.","DOI":"10.1109\/IPAS55744.2022.10052945"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hasnine, M.N., Bui, H.T.T., Tran, T.T.T., Nguyen, H.T., Ak\u00e7ap\u00f5nar, G., and Ueda, H. (2021). Students\u2019 emotion extraction and visualization for engagement detection in online learning. Procedia Computer Science, Elsevier B.V.","DOI":"10.1016\/j.procs.2021.09.115"},{"key":"ref_15","unstructured":"Snell, J., Swersky, K., and Zemel, T.R. (2017, January 4\u20139). Prototypical Networks for Few-shot Learning. Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, H., Fu, Y., and Meng, J. (2024, January 25\u201327). Engagement Detection in Online Learning Based on Pre-trained Vision Transformer and Temporal Convolutional Network. Proceedings of the 2024 36th Chinese Control and Decision Conference (CCDC), Xi\u2019an, China.","DOI":"10.1109\/CCDC62350.2024.10588350"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Maddula, N.V.S.S., Nair, L.R., Addepalli, H., and Palaniswamy, S. (2021). Emotion Recognition from Facial Expressions Using Siamese Network. Communications in Computer and Information Science, Springer Science and Business Media Deutschland GmbH.","DOI":"10.1007\/978-981-16-0419-5_6"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2290","DOI":"10.1109\/TGRS.2018.2872830","article-title":"Deep Few-Shot Learning for Hyperspectral Image Classification","volume":"57","author":"Liu","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., and Hospedales, T.M. (2018, January 18\u201322). Learning to Compare: Relation Network for Few-Shot Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00131"},{"key":"ref_20","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 18\u201324). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning, Online."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1016\/j.aej.2024.06.074","article-title":"Leveraging part-and-sensitive attention network and transformer for learner engagement detection","volume":"107","author":"Su","year":"2024","journal-title":"Alex. Eng. J."},{"key":"ref_22","unstructured":"Sch\u00f6lkopf, B., Platt, J., and Hoffman, T. (2006). A Kernel Method for the Two-Sample-Problem. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"99573","DOI":"10.1109\/ACCESS.2022.3206779","article-title":"Students Engagement Level Detection in Online e-Learning Using Hybrid EfficientNetB7 Together With TCN, LSTM, and Bi-LSTM","volume":"10","author":"Selim","year":"2022","journal-title":"IEEE Access"},{"key":"ref_25","unstructured":"Mandia, S., Singh, K., Mitharwal, R., Mushtaq, F., and Janu, D. (2025). Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning. arXiv."},{"key":"ref_26","unstructured":"Tieu, B.H., Nguyen, T.T., and Nguyen, T.T. (2019, January 12\u201313). Detecting Student Engagement in Classrooms for Intelligent Tutoring Systems. Proceedings of the 2019 6th NAFOSTED Conference on Information and Computer Science (NICS), Hanoi, Vietnam."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3535","DOI":"10.1007\/s11760-023-02578-z","article-title":"Detecting Disengagement in Virtual Learning as an Anomaly using Temporal Convolutional Network Autoencoder","volume":"17","author":"Abedi","year":"2023","journal-title":"Signal Image Video Process."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/3\/109\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:55:37Z","timestamp":1760028937000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/3\/109"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,18]]},"references-count":27,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["computers14030109"],"URL":"https:\/\/doi.org\/10.3390\/computers14030109","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,18]]}}}