{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T20:44:26Z","timestamp":1774644266236,"version":"3.50.1"},"reference-count":30,"publisher":"MDPI AG","issue":"20","license":[{"start":{"date-parts":[[2023,10,11]],"date-time":"2023-10-11T00:00:00Z","timestamp":1696982400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science and Technology Department of Heilongjiang Province","award":["GZ20220131"],"award-info":[{"award-number":["GZ20220131"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Accurately detecting student classroom behaviors in classroom videos is beneficial for analyzing students\u2019 classroom performance and consequently enhancing teaching effectiveness. To address challenges such as object density, occlusion, and multi-scale scenarios in classroom video images, this paper introduces an improved YOLOv8 classroom detection model. Firstly, by combining modules from the Res2Net and YOLOv8 network models, a novel C2f_Res2block module is proposed. This module, along with MHSA and EMA, is integrated into the YOLOv8 model. Experimental results on a classroom detection dataset demonstrate that the improved model in this paper exhibits better detection performance compared to the original YOLOv8, with an average precision (mAP@0.5) increase of 4.2%.<\/jats:p>","DOI":"10.3390\/s23208385","type":"journal-article","created":{"date-parts":[[2023,10,11]],"date-time":"2023-10-11T08:18:57Z","timestamp":1697012337000},"page":"8385","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":81,"title":["Student Behavior Detection in the Classroom Based on Improved YOLOv8"],"prefix":"10.3390","volume":"23","author":[{"given":"Haiwei","family":"Chen","sequence":"first","affiliation":[{"name":"School of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4500-1002","authenticated-orcid":false,"given":"Guohui","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Computer Science and Information Engineering, Harbin Normal University, Harbin 150025, China"}]},{"given":"Huixin","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Life Sciences and Technology, Harbin Normal University, Harbin 150025, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,10,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"3299","DOI":"10.1007\/s10639-020-10116-4","article-title":"Smart Education Literature: A Theoretical Analysis","volume":"25","author":"Singh","year":"2020","journal-title":"Educ. Inf. Technol."},{"key":"ref_2","first-page":"1","article-title":"Classroom Learning Status Assessment Based on Deep Learning","volume":"2022","author":"Zhou","year":"2022","journal-title":"Math. Probl. Eng."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Hu, M., Wei, Y., Li, M., Yao, H., Deng, W., Tong, M., and Liu, Q. (2022). Bimodal Learning Engagement Recognition from Videos in the Classroom. Sensors, 22.","DOI":"10.3390\/s22165932"},{"key":"ref_4","first-page":"1","article-title":"Identifying and Monitoring Students\u2019 Classroom Learning Behavior Based on Multisource Information","volume":"2022","author":"Sun","year":"2022","journal-title":"Mob. Inf. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"9370","DOI":"10.1109\/JSEN.2018.2870957","article-title":"Visual Object Recognition and Pose Estimation Based on a Deep Semantic Segmentation Network","volume":"18","author":"Lin","year":"2018","journal-title":"IEEE Sensors J."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chen, H., and Guan, J. (2022). Teacher\u2013Student Behavior Recognition in Classroom Teaching Based on Improved YOLO-v4 and Internet of Things Technology. Electronics, 11.","DOI":"10.3390\/electronics11233998"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Gu, C., Sun, C., Ross, D.A., Vondrick, C., Pantofaru, C., Li, Y., Vijayanarasimhan, S., Toderici, G., Ricco, S., and Sukthankar, R. (2018, January 18\u201323). AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00633"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Fan, H., Malik, J., and He, K. (November, January 27). SlowFast Networks for Video Recognition. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00630"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"510","DOI":"10.1007\/978-3-319-46448-0_31","article-title":"Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding","volume":"Volume 9905","author":"Leibe","year":"2016","journal-title":"Computer Vision\u2014ECCV 2016"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Carreira, J., and Zisserman, A. (2017, January 21\u201326). Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref_11","unstructured":"(2023, August 18). Ultralytics\/Ultralytics: NEW\u2014YOLOv8 in PyTorch > ONNX > OpenVINO > CoreML > TFLite. Available online: https:\/\/github.com\/ultralytics\/ultralytics."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/JPROC.2023.3238524","article-title":"Object Detection in 20 Years: A Survey","volume":"111","author":"Zou","year":"2023","journal-title":"Proc. IEEE"},{"key":"ref_13","unstructured":"Jolicoeur-Martineau, A., and Mitliagkas, I. (2019). Gradient Penalty from a Maximum Margin Perspective. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"119451","DOI":"10.1016\/j.eswa.2022.119451","article-title":"Hierarchical Belief Rule-Based Model for Imbalanced Multi-Classification","volume":"216","author":"Hu","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_15","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., and Doll\u00e1r, P. (2014). Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland, 6\u201312 September 2014, Springer International Publishing."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Jiang, Y., Zhu, X., Wang, X., Yang, S., Li, W., Wang, H., Fu, P., and Luo, Z. (2017). R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection. arXiv.","DOI":"10.1109\/ICPR.2018.8545598"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"9259","DOI":"10.1609\/aaai.v33i01.33019259","article-title":"M2Det: A Single-Shot Object Detector Based on Multi-Level Feature Pyramid Network","volume":"33","author":"Zhao","year":"2019","journal-title":"AAAI"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Guo, C., Fan, B., Zhang, Q., Xiang, S., and Pan, C. (2020, January 13\u201319). AugFPN: Improving Multi-Scale Feature Learning for Object Detection. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01261"},{"key":"ref_21","first-page":"044004","article-title":"End-to-End Ground Calibration and in-Flight Performance of the FIREBall-2 Instrument","volume":"6","author":"Picouet","year":"2021","journal-title":"J. Astron.Telesc.Instrum. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lu, J., Xiong, C., Parikh, D., and Socher, R. (2017, January 21\u201326). Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.345"},{"key":"ref_23","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Yang, F., Wang, T., and Wang, X. (2023). Student Classroom Behavior Detection Based on YOLOv7-BRA and Multi-Model Fusion. arXiv.","DOI":"10.1007\/978-3-031-46311-2_4"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1109\/TPAMI.2019.2938758","article-title":"Res2Net: A New Multi-Scale Backbone Architecture","volume":"43","author":"Gao","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ouyang, D., He, S., Zhang, G., Luo, M., Guo, H., Zhan, J., and Huang, Z. (2023, January 4\u201310). Efficient Multi-Scale Attention Module with Cross-Spatial Learning. Proceedings of the ICASSP 2023\u20142023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece.","DOI":"10.1109\/ICASSP49357.2023.10096516"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Liu, H., Liu, F., Fan, X., and Huang, D. (2021). Polarized Self-Attention: Towards High-Quality Pixel-Wise Regression. arXiv.","DOI":"10.1016\/j.neucom.2022.07.054"},{"key":"ref_28","unstructured":"Fan, Y. (2023). SCB-Dataset: A Dataset for Detecting Student Classroom Behavior. arXiv."},{"key":"ref_29","unstructured":"Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., and Sun, J. (2018). CrowdHuman: A Benchmark for Detecting Human in a Crowd. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, W., Dai, J., Chen, Z., Huang, Z., Li, Z., Zhu, X., Hu, X., Lu, T., Lu, L., and Li, H. (2023, January 17\u201324). InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01385"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/20\/8385\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:04:46Z","timestamp":1760130286000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/20\/8385"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,11]]},"references-count":30,"journal-issue":{"issue":"20","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["s23208385"],"URL":"https:\/\/doi.org\/10.3390\/s23208385","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,11]]}}}