{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,7]],"date-time":"2026-06-07T10:18:35Z","timestamp":1780827515006,"version":"3.54.1"},"reference-count":48,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T00:00:00Z","timestamp":1758844800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.mdpi.com"],"crossmark-restriction":true},"short-container-title":["Information"],"abstract":"<jats:p>In recent years, most conventional emotion recognition approaches have concentrated primarily on facial cues, often overlooking complementary sources of information such as body posture and contextual background. This limitation reduces their effectiveness in complex, real-world environments. In this work, we present a multi-branch emotion recognition framework that separately processes facial, bodily, and contextual information using three dedicated neural networks. To better capture contextual cues, we intentionally mask the face and body of the main subject within the scene, prompting the model to explore alternative visual elements that may convey emotional states. To further enhance the quality of the extracted features, we integrate both channel and spatial attention mechanisms into the network architecture. Evaluated on the challenging NCAER-S dataset, our model achieves an accuracy of 56.42%, surpassing the state-of-the-art GLAMOUR-Net. These results highlight the effectiveness of combining multi-cue representation and attention-guided feature extraction for robust emotion recognition in unconstrained settings. The findings also highlight the importance of accurate emotion recognition for human\u2013computer interaction, where affect detection enables systems to adapt to users and deliver more effective experiences.<\/jats:p>","DOI":"10.3390\/info16100834","type":"journal-article","created":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T12:33:03Z","timestamp":1758889983000},"page":"834","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Context_Driven Emotion Recognition: Integrating Multi_Cue Fusion and Attention Mechanisms for Enhanced Accuracy on the NCAER_S Dataset"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-3205-9509","authenticated-orcid":false,"given":"Merieme","family":"Elkorchi","sequence":"first","affiliation":[{"name":"Advanced digital enterprise modeling and information retrieval (ADMIR) laboratory, Rabat IT Center, Information retrieval and data analytics team (IRDA), ENSIAS, Mohammed V University in Rabat, Rabat 11000, Morocco"},{"name":"SMARTiLab Laboratory, Moroccan School of Engineering Sciences (EMSI Rabat\/SMARTILAB), Rabat 11000, Morocco"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Boutaina","family":"Hdioud","sequence":"additional","affiliation":[{"name":"Advanced digital enterprise modeling and information retrieval (ADMIR) laboratory, Rabat IT Center, Information retrieval and data analytics team (IRDA), ENSIAS, Mohammed V University in Rabat, Rabat 11000, Morocco"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9736-7260","authenticated-orcid":false,"given":"Rachid","family":"Oulad Haj Thami","sequence":"additional","affiliation":[{"name":"Advanced digital enterprise modeling and information retrieval (ADMIR) laboratory, Rabat IT Center, Information retrieval and data analytics team (IRDA), ENSIAS, Mohammed V University in Rabat, Rabat 11000, Morocco"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Safae","family":"Merzouk","sequence":"additional","affiliation":[{"name":"SMARTiLab Laboratory, Moroccan School of Engineering Sciences (EMSI Rabat\/SMARTILAB), Rabat 11000, Morocco"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,26]]},"reference":[{"key":"ref_1","unstructured":"Randhavane, T., Bhattacharya, U., Kapsaskis, K., Gray, K., Bera, A., and Manocha, D. (2019). Identifying emotions from walking using affective and deep features. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Stathopoulou, I.O., and Tsihrintzis, G.A. (2011). Emotion recognition from body movements and gestures. Intelligent Interactive Multimedia Systems and Services, Proceedings of the 4th International Conference on Intelligent Interactive Multimedia Systems and Services (IIMSS 2011), Piraeus, Greece, 20\u201322 July 2011, Springer.","DOI":"10.1007\/978-3-642-22158-3_29"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1238","DOI":"10.1016\/j.neunet.2008.05.003","article-title":"Recognizing emotions expressed by body pose: A biologically inspired neural model","volume":"21","author":"Schindler","year":"2008","journal-title":"Neural Netw."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Han, K., Yu, D., and Tashev, I. (2014, January 14\u201318). Speech emotion recognition using deep neural network and extreme learning machine. Proceedings of the Interspeech 2014, Singapore.","DOI":"10.21437\/Interspeech.2014-57"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1016\/j.specom.2008.03.012","article-title":"Fear-type emotion recognition for future audio-based surveillance systems","volume":"50","author":"Clavel","year":"2008","journal-title":"Speech Commun."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Yang, D., Huang, S., Xu, Z., Li, Z., Wang, S., Li, M., Wang, Y., Liu, Y., Yang, K., and Chen, Z. (2023, January 1\u20136). Aide: A vision-driven multi-view, multi-modal, multi-tasking dataset for assistive driving perception. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01871"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ali, M., Mosa, A.H., Al Machot, F., and Kyamakya, K. (2016, January 5\u20138). EEG-based emotion recognition approach for e-healthcare applications. Proceedings of the 2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN), Vienna, Austria.","DOI":"10.1109\/ICUFN.2016.7536936"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1016\/j.neunet.2005.03.006","article-title":"Emotion recognition in human\u2013computer interaction","volume":"18","author":"Fragopanagos","year":"2005","journal-title":"Neural Netw."},{"key":"ref_9","unstructured":"Fukui, K., and Yamaguchi, O. Face recognition using multi-viewpoint patterns for robot vision. Proceedings of the Robotics Research. The Eleventh International Symposium: With 303 Figures."},{"key":"ref_10","unstructured":"Ma, X., Lin, W., Huang, D., Dong, M., and Li, H. (2017, January 4\u20136). Facial emotion recognition. Proceedings of the 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP), Singapore."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.patrec.2019.01.008","article-title":"Extended deep neural network for facial emotion recognition","volume":"120","author":"Jain","year":"2019","journal-title":"Pattern Recognit. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"4057","DOI":"10.1109\/TIP.2019.2956143","article-title":"Region attention networks for pose and occlusion robust facial expression recognition","volume":"29","author":"Wang","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2439","DOI":"10.1109\/TIP.2018.2886767","article-title":"Occlusion Aware Facial Expression Recognition Using CNN with Attention Mechanism","volume":"28","author":"Li","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_14","unstructured":"Dhall, A., Goecke, R., Lucey, S., and Gedeon, T. (2011). Acted Facial Expressions in the Wild Database, The Australian National University. TR-CS-11-02."},{"key":"ref_15","unstructured":"Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., and Lee, D.H. (2013). Challenges in representation learning: A report on three machine learning contests. Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Korea, 3\u20137 November 2013. Proceedings, Part III 20, Springer."},{"key":"ref_16","unstructured":"Piana, S., Staglian\u00f2, A., Odone, F., Verri, A., and Camurri, A. (2014). Real-time Automatic Emotion Recognition from Body Gestures. arXiv."},{"key":"ref_17","unstructured":"Lee, J., Kim, S., Kim, S., Park, J., and Sohn, K. (November, January 27). Context-aware emotion recognition networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_18","unstructured":"Costa, W.L., Mac\u00eado, D., Zanchettin, C., Figueiredo, L.S., and Teichrieb, V. (2021). Multi-Cue Adaptive Emotion Recognition Network. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"103679","DOI":"10.1016\/j.jvcir.2022.103679","article-title":"Context-dependent emotion recognition","volume":"89","author":"Wang","year":"2022","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Meng, D., Peng, X., Wang, K., and Qiao, Y. (2019, January 22\u201325). Frame attention networks for facial expression recognition in videos. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803603"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"64827","DOI":"10.1109\/ACCESS.2019.2917266","article-title":"Local learning with deep and handcrafted features for facial expression recognition","volume":"7","author":"Georgescu","year":"2019","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Garber-Barron, M., and Si, M. (2012, January 10\u201315). Using body movement and posture for emotion detection in non-acted scenarios. Proceedings of the 2012 IEEE International Conference on Fuzzy Systems, Brisbane, Australia.","DOI":"10.1109\/FUZZ-IEEE.2012.6250780"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"11761","DOI":"10.1109\/ACCESS.2019.2963113","article-title":"Emotion recognition from body movement","volume":"8","author":"Ahmed","year":"2019","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Luna-Jim\u00e9nez, C., Griol, D., Callejas, Z., Kleinlein, R., Montero, J.M., and Fern\u00e1ndez-Mart\u00ednez, F. (2021). Multimodal emotion recognition on RAVDESS dataset using transfer learning. Sensors, 21.","DOI":"10.3390\/s21227665"},{"key":"ref_25","unstructured":"Caridakis, G., Castellano, G., Kessous, L., Raouzaiou, A., Malatesta, L., Asteriadis, S., and Karpouzis, K. (2007). Multimodal emotion recognition from expressive faces, body gestures and speech. Artificial Intelligence and Innovations 2007: From Theory to Applications, Proceedings of the 4th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI 2007), Athens, Greece, 19\u201321 September 2007, Springer."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kosti, R., Alvarez, J.M., Recasens, A., and Lapedriza, A. (2017, January 21\u201326). Emotion recognition in context. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.212"},{"key":"ref_27","first-page":"2755","article-title":"Context based emotion recognition using emotic dataset","volume":"42","author":"Kosti","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Yang, D., Huang, S., Wang, S., Liu, Y., Zhai, P., Su, L., Li, M., and Zhang, L. (2022, January 23\u201327). Emotion recognition for multiple context awareness. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19836-6_9"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"21625","DOI":"10.1007\/s00521-021-06778-x","article-title":"Global-local attention for emotion recognition","volume":"34","author":"Le","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_30","unstructured":"Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv."},{"key":"ref_31","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, Curran Associates Inc."},{"key":"ref_32","unstructured":"Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., and Xu, Y. (2020). A survey on visual transformer. arXiv."},{"key":"ref_33","unstructured":"Bello, I., Zoph, B., Vaswani, A., Shlens, J., and Le, Q.V. (November, January 27). Attention augmented convolutional networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_34","unstructured":"Hu, J., Shen, L., Albanie, S., Sun, G., and Vedaldi, A. (2018). Gather-excite: Exploiting feature context in convolutional neural networks. Advances in Neural Information Processing Systems, Curran Associates Inc."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_36","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (November, January 27). Ccnet: Criss-cross attention for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Aminbeidokhti, M., Pedersoli, M., Cardinal, P., and Granger, E. (2019, January 27\u201329). Emotion Recognition with Spatial Attention and Temporal Softmax Pooling. Proceedings of the Image Analysis and Recognition, Waterloo, ON, Canada.","DOI":"10.1007\/978-3-030-27202-9_29"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018). CBAM: Convolutional Block Attention Module. arXiv.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_40","unstructured":"Jocher, G., Chaurasia, A., and Qiu, J. (2024, January 14). Ultralytics YOLO. Available online: https:\/\/github.com\/ultralytics\/ultralytics."},{"key":"ref_41","first-page":"26","article-title":"Multilayer perceptron: Architecture optimization and training","volume":"4","author":"Ramchoun","year":"2016","journal-title":"Int. J. Interact. Multimed. Artif. Intell."},{"key":"ref_42","unstructured":"Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., and Isard, M. (2016, January 2\u20134). {TensorFlow}: A system for {Large-Scale} machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA."},{"key":"ref_43","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_45","first-page":"1755","article-title":"Dlib-ml: A machine learning toolkit","volume":"10","author":"King","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Dhall, A., Goecke, R., Joshi, J., Hoey, J., and Gedeon, T. (2016, January 12\u201316). Emotiw 2016: Video and group-level emotion recognition challenges. Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan.","DOI":"10.1145\/2993148.2997638"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Do, N.T., Kim, S.H., Yang, H.J., Lee, G.S., and Yeom, S. (2021). Context-aware emotion recognition in the wild using spatio-temporal and temporal-pyramid models. Sensors, 21.","DOI":"10.3390\/s21072344"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Xu, H., Kong, J., Kong, X., Li, J., and Wang, J. (2022). MCF-Net: Fusion Network of Facial and Scene Features for Expression Recognition in the Wild. Appl. Sci., 12.","DOI":"10.3390\/app122010251"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/10\/834\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T12:40:52Z","timestamp":1758890452000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/10\/834"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,26]]},"references-count":48,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["info16100834"],"URL":"https:\/\/doi.org\/10.3390\/info16100834","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,26]]}}}