{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T18:22:19Z","timestamp":1778955739977,"version":"3.51.4"},"reference-count":38,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2022,8,9]],"date-time":"2022-08-09T00:00:00Z","timestamp":1660003200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Collaborative Innovation Experimental Base Construction Project for Teacher Development of Central China Normal University","award":["CCNUTEIII-2021-19"],"award-info":[{"award-number":["CCNUTEIII-2021-19"]}]},{"name":"National Collaborative Innovation Experimental Base Construction Project for Teacher Development of Central China Normal University","award":["2022010801010274"],"award-info":[{"award-number":["2022010801010274"]}]},{"name":"National Collaborative Innovation Experimental Base Construction Project for Teacher Development of Central China Normal University","award":["20YJC880100"],"award-info":[{"award-number":["20YJC880100"]}]},{"name":"Special Project of Wuhan Knowledge Innovation","award":["CCNUTEIII-2021-19"],"award-info":[{"award-number":["CCNUTEIII-2021-19"]}]},{"name":"Special Project of Wuhan Knowledge Innovation","award":["2022010801010274"],"award-info":[{"award-number":["2022010801010274"]}]},{"name":"Special Project of Wuhan Knowledge Innovation","award":["20YJC880100"],"award-info":[{"award-number":["20YJC880100"]}]},{"name":"Humanities and Social Sciences of China MOE","award":["CCNUTEIII-2021-19"],"award-info":[{"award-number":["CCNUTEIII-2021-19"]}]},{"name":"Humanities and Social Sciences of China MOE","award":["2022010801010274"],"award-info":[{"award-number":["2022010801010274"]}]},{"name":"Humanities and Social Sciences of China MOE","award":["20YJC880100"],"award-info":[{"award-number":["20YJC880100"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Engagement plays an essential role in the learning process. Recognition of learning engagement in the classroom helps us understand the student\u2019s learning state and optimize the teaching and study processes. Traditional recognition methods such as self-report and teacher observation are time-consuming and obtrusive to satisfy the needs of large-scale classrooms. With the development of big data analysis and artificial intelligence, applying intelligent methods such as deep learning to recognize learning engagement has become the research hotspot in education. In this paper, based on non-invasive classroom videos, first, a multi-cues classroom learning engagement database was constructed. Then, we introduced the power IoU loss function to You Only Look Once version 5 (YOLOv5) to detect the students and obtained a precision of 95.4%. Finally, we designed a bimodal learning engagement recognition method based on ResNet50 and CoAtNet. Our proposed bimodal learning engagement method obtained an accuracy of 93.94% using the KNN classifier. The experimental results confirmed that the proposed method outperforms most state-of-the-art techniques.<\/jats:p>","DOI":"10.3390\/s22165932","type":"journal-article","created":{"date-parts":[[2022,8,9]],"date-time":"2022-08-09T04:16:55Z","timestamp":1660018615000},"page":"5932","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":39,"title":["Bimodal Learning Engagement Recognition from Videos in the Classroom"],"prefix":"10.3390","volume":"22","author":[{"given":"Meijia","family":"Hu","sequence":"first","affiliation":[{"name":"Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China"},{"name":"Huanggang High School of Hubei Province, Huanggang 438000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yantao","family":"Wei","sequence":"additional","affiliation":[{"name":"Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mengsiying","family":"Li","sequence":"additional","affiliation":[{"name":"School of Management, Wuhan College, Wuhan 430212, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5055-4106","authenticated-orcid":false,"given":"Huang","family":"Yao","sequence":"additional","affiliation":[{"name":"Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei","family":"Deng","sequence":"additional","affiliation":[{"name":"Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mingwen","family":"Tong","sequence":"additional","affiliation":[{"name":"Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qingtang","family":"Liu","sequence":"additional","affiliation":[{"name":"Hubei Research Center for Educational Informationization, Faculty of Artificial Intelligence in Education, Central China Normal University, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"101146","DOI":"10.1016\/j.stueduc.2022.101146","article-title":"Engagement, achievement, and teacher classroom practices in mathematics: Insights from TIMSS 2011 and PISA 2012","volume":"73","author":"Zhang","year":"2022","journal-title":"Stud. Educ. Eval."},{"key":"ref_2","first-page":"72","article-title":"Multi-modal Learning Analysis for Group Multi Engagement Feature Portrait of Collaborative Learning","volume":"40","author":"Ma","year":"2022","journal-title":"J. Distance Educ."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"59","DOI":"10.3102\/00346543074001059","article-title":"School engagement: Potential of the concept, state of the evidence","volume":"74","author":"Fredricks","year":"2004","journal-title":"Rev. Educ. Res."},{"key":"ref_4","first-page":"3","article-title":"Predicting affective states expressed through an emote-aloud procedure from AutoTutor\u2019s mixed-initiative dialogue","volume":"16","author":"Craig","year":"2006","journal-title":"Int. J. Artif. Intell. Educ."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Grafsgaard, J.F., Fulton, R.M., Boyer, K.E., Wiebe, E.N., and Lester, J.C. (2012, January 22\u201326). Multimodal analysis of the implicit affective channel in computer-mediated textual communication. Proceedings of the 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA, USA.","DOI":"10.1145\/2388676.2388708"},{"key":"ref_6","unstructured":"S\u00fcmer, \u00d6., Goldberg, P., D\u2019Mello, S., Gerjets, P., Trautwein, U., and Kasneci, E. (2021). Multimodal engagement analysis from facial videos in the classroom. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1016\/j.compedu.2016.02.006","article-title":"Students\u2019 LMS interaction patterns and their relationship with achievement: A case study in higher education","volume":"96","author":"Cerezo","year":"2016","journal-title":"Comput. Educ."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Okubo, F., Yamashita, T., Shimada, A., and Ogata, H. (2017, January 13\u201317). A neural network approach for students\u2019 performance prediction. Proceedings of the Seventh International Learning Analytics Knowledge Conference, Vancouver, BC, Canada.","DOI":"10.1145\/3027385.3029479"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1016\/j.iheduc.2015.11.003","article-title":"Identifying significant indicators using LMS data to predict course achievement in online learning","volume":"29","author":"You","year":"2016","journal-title":"Internet High. Educ."},{"key":"ref_10","first-page":"88","article-title":"Engagement tracing: Using response times to model student disengagement","volume":"125","author":"Joseph","year":"2005","journal-title":"Artif. Intell. Educ. Supporting Learn. Through Intell. Soc. Inf. Technol."},{"key":"ref_11","first-page":"30","article-title":"Intelligent tutoring goes to school in the big city","volume":"8","author":"Koedinger","year":"1997","journal-title":"Int. J. Artif. Intell. Educ."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2449","DOI":"10.1109\/TMM.2021.3081873","article-title":"MFDNet: Collaborative Poses Perception and Matrix Fisher Distribution for Head Pose Estimation","volume":"24","author":"Liu","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1016\/j.neucom.2020.12.090","article-title":"NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom","volume":"436","author":"Liu","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Hamester, D., Barros, P., and Wermter, S. (2015, January 12\u201316). Face expression recognition with a 2-channel convolutional neural network. Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland.","DOI":"10.1109\/IJCNN.2015.7280539"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ebrahimi Kahou, S., Michalski, V., Konda, K., Memisevic, R., and Pal, C. (2015, January 9\u201313). Recurrent neural networks for emotion recognition in video. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Washington, DC, USA.","DOI":"10.1145\/2818346.2830596"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.neucom.2020.05.081","article-title":"Infrared facial expression recognition via Gaussian-based label distribution learning in the dark illumination environment for human emotion detection","volume":"409","author":"Zhang","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_17","first-page":"5008010","article-title":"Robust 3-D Gaze Estimation via Data Optimization and Saliency Aggregation for Mobile Eye-Tracking Systems","volume":"70","author":"Liu","year":"2021","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"103660","DOI":"10.1016\/j.infrared.2021.103660","article-title":"Human pose recognition via adaptive distribution encoding for action perception in the self-regulated learning process","volume":"114","author":"Liu","year":"2021","journal-title":"Infrared Phys. Technol."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"e12839","DOI":"10.1111\/exsy.12839","article-title":"An intelligent system for monitoring students\u2019 engagement in large classroom teaching through facial expression recognition","volume":"39","author":"Pabba","year":"2022","journal-title":"Expert Syst."},{"key":"ref_20","unstructured":"Ventura, J., Cruz, S., and Boult, T.E. (2016, January 26). Improving teaching and learning through video 53 summaries of student engagement. Proceedings of the Workshop on Computational Models for Learning Systems and Educational Assessment (CMLA 2016), Las Vegas, NV, USA."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"150693","DOI":"10.1109\/ACCESS.2019.2947519","article-title":"Unobtrusive behavioral analysis of students in classroom environment using non-verbal cues","volume":"7","author":"Ashwin","year":"2019","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kumar, S., Yadav, D., Gupta, H., and Verma, O.P. (2022). Smart Classroom Surveillance System Using YOLOv3 Algorithm. Recent Innovations in Mechanical Engineering, Springer.","DOI":"10.1007\/978-981-16-9236-9_6"},{"key":"ref_23","first-page":"7049458","article-title":"Classroom Learning Status Assessment Based on Deep Learning","volume":"2022","author":"Zhou","year":"2022","journal-title":"Math. Probl. Eng."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ren, X., and Yang, D. (2021, January 20\u201322). Student behavior detection based on YOLOv4-Bi. Proceedings of the 2021 IEEE International Conference on Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE), Beijing, China.","DOI":"10.1109\/CSAIEE54046.2021.9543310"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"29686","DOI":"10.1109\/ACCESS.2021.3058526","article-title":"Semi-supervised dim and small infrared ship detection network based on haar wavelet","volume":"9","author":"Song","year":"2021","journal-title":"IEEE Access"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"12861","DOI":"10.1007\/s11227-022-04402-w","article-title":"An improved method of identifying learner\u2019s behaviors based on deep learning","volume":"78","author":"Liu","year":"2022","journal-title":"J. Supercomput."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kim, D., Park, S., Kang, D., and Paik, J. (2019, January 8\u201311). Improved center and scale prediction-based pedestrian detection using convolutional block. Proceedings of the 2019 IEEE 9th International Conference on Consumer Electronics (ICCE-Berlin), Berlin, Germany.","DOI":"10.1109\/ICCE-Berlin47944.2019.8966154"},{"key":"ref_28","unstructured":"Targ, S., Almeida, D., and Lyman, K. (2016). Resnet in resnet: Generalizing residual architectures. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2017, January 21\u201326). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_30","first-page":"84","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30, Available online: https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf."},{"key":"ref_32","first-page":"3965","article-title":"Coatnet: Marrying convolution and attention for all data sizes","volume":"34","author":"Dai","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_33","first-page":"20","article-title":"Maximum likelihood estimation of observer error-rates using the EM algorithm","volume":"28","author":"Dawid","year":"1979","journal-title":"J. R. Stat. Soc. Ser. C Appl. Stat."},{"key":"ref_34","first-page":"20230","article-title":"alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression","volume":"34","author":"He","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1016\/j.procs.2021.08.098","article-title":"Student Behavior Recognition in Classroom using Deep Transfer Learning with VGG-16","volume":"192","author":"Abdallah","year":"2021","journal-title":"Procedia Comput. Sci."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"U\u00e7ar, M.U., and \u00d6zdemir, E. (2022). Recognizing Students and Detecting Student Engagement with Real-Time Image Processing. Electronics, 11.","DOI":"10.3390\/electronics11091500"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"4361","DOI":"10.1109\/TII.2021.3128240","article-title":"EDMF: Efficient Deep Matrix Factorization with Review Feature Learning for Industrial Recommender System","volume":"18","author":"Liu","year":"2022","journal-title":"IEEE Trans. Ind. Inf."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1016\/j.neucom.2020.09.068","article-title":"Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction","volume":"433","author":"Liu","year":"2021","journal-title":"Neurocomputing"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/16\/5932\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:06:01Z","timestamp":1760141161000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/16\/5932"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,9]]},"references-count":38,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["s22165932"],"URL":"https:\/\/doi.org\/10.3390\/s22165932","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,9]]}}}