{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T10:15:02Z","timestamp":1779876902655,"version":"3.53.1"},"reference-count":64,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T00:00:00Z","timestamp":1769904000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>This research addresses the challenge of inferring complex psychological states, including stress, fatigue, anxiety, cognitive load, and boredom, from facial expressions. We propose an interpretable, literature-informed emotion-weighting methodology that transforms the eight-emotion probability outputs of facial emotion recognition models into continuous estimates of these five psychological states using weights derived from the Valence\u2013Arousal framework, providing a principled bridge between discrete emotion predictions and higher-level affective constructs. The proposed formulation is evaluated across six representative deep learning architectures\u2014a baseline CNN (ResNet-50), a modern CNN (ConvNeXt), a hybrid attention-based model (DDAMFN), and three Transformer-based models (ViT, BEiT, and Swin). Our results demonstrate that strong performance on discrete FER tasks does not directly translate to consistent behavior in complex state inference; instead, architectures capable of preserving subtle and distributed affective cues yield more stable and interpretable state estimates, with DDAMFN and Vision Transformer models exhibiting the most consistent performance across the evaluated psychological states. These findings highlight the central role of the proposed emotion-weighting formulation and the importance of architecture selection beyond categorical accuracy in complex affective state analysis.<\/jats:p>","DOI":"10.3390\/computers15020077","type":"journal-article","created":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T10:03:29Z","timestamp":1770113009000},"page":"77","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A Multimodal Transformer-Based Framework for Emotion Analysis in Multilingual Video Content"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-9923-9590","authenticated-orcid":false,"given":"Sehmus","family":"Yakut","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Yildiz Technical University, 34220 Istanbul, Turkey"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0377-1310","authenticated-orcid":false,"given":"Yusuf Taha","family":"Tuten","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Yildiz Technical University, 34220 Istanbul, Turkey"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eren","family":"Caglar","sequence":"additional","affiliation":[{"name":"R&D Center, Aktif Bank, 34220 Istanbul, Turkey"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7908-5067","authenticated-orcid":false,"given":"Mehmet S.","family":"Aktas","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Yildiz Technical University, 34220 Istanbul, Turkey"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Jia, C., Jianping, L., Changrun, C., and Lixi, C. (2023, January 15\u201317). A review of driver fatigue detection based on facial expression recognition. Proceedings of the 2023 20th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China.","DOI":"10.1109\/ICCWAMTIP60502.2023.10387098"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Gao, H., Yuce, A., and Thiran, J.P. (2014, January 27\u201330). Detecting emotional stress from facial expressions for driving safety. Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France.","DOI":"10.1109\/ICIP.2014.7026203"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/TAFFC.2017.2740923","article-title":"AffectNet: A database for facial expression, valence, and arousal computing in the wild","volume":"10","author":"Mollahosseini","year":"2017","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1098","DOI":"10.1080\/0309877X.2019.1658729","article-title":"Class-Related Boredom among University Students: A Qualitative Research on Boredom Coping Strategies","volume":"44","author":"Finkielsztein","year":"2020","journal-title":"J. Furth. High. Educ."},{"key":"ref_5","unstructured":"Klingner, J. (2010). Measuring Cognitive Load During Visual Tasks by Combining Pupillometry and Eye Tracking, Stanford University."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep residual learning for image recognition. arXiv.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022, January 18\u201324). A convnet for the 2020s. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Leong, F.H. (2020, January 26\u201328). Deep learning of facial embeddings and facial landmark points for the detection of academic emotions. Proceedings of the 5th International Conference on Information and Education Innovations (ICIEI \u201920), New York, NY, USA.","DOI":"10.1145\/3411681.3411684"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"6609","DOI":"10.1007\/s10489-020-02139-8","article-title":"Deep facial spatiotemporal network for engagement prediction in online learning","volume":"51","author":"Liao","year":"2021","journal-title":"Appl. Intell."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Mishra, P., Verma, A.S., Chaudhary, P., and Dutta, A. (2024, January 5\u20137). Emotion Recognition from Facial Expression Using Deep Learning Techniques. Proceedings of the 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India.","DOI":"10.1109\/I2CT61223.2024.10543313"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3375","DOI":"10.11591\/ijece.v13i3.pp3375-3383","article-title":"Facial emotion recognition using deep learning detector and classifier","volume":"13","author":"Ooi","year":"2023","journal-title":"Int. J. Electr. Comput. Eng. (IJECE)"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1195","DOI":"10.1109\/TAFFC.2020.2981446","article-title":"Deep Facial Expression Recognition: A Survey","volume":"13","author":"Li","year":"2022","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_14","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_15","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An Image is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. arXiv."},{"key":"ref_16","unstructured":"Bao, H., Dong, L., and Wei, F. (2021). BEiT: BERT Pre-Training of Image Transformers. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Pecoraro, R., Basile, V., Bono, V., and Gallo, S. (2021). Local Multi-Head Channel Self-Attention for Facial Expression Recognition. arXiv.","DOI":"10.3390\/info13090419"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"79327","DOI":"10.1109\/ACCESS.2024.3407108","article-title":"PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition","volume":"12","author":"Ngwe","year":"2024","journal-title":"IEEE Access"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhang, S., Zhang, Y., Zhang, Y., Wang, Y., and Song, Z. (2023). A Dual-Direction Attention Mixed Feature Network for Facial Expression Recognition. Electronics, 12.","DOI":"10.3390\/electronics12173595"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"El Boudouri, Y., and Bohi, A. (2023, January 27\u201329). EmoNeXt: An Adapted ConvNeXt for Facial Emotion Recognition. Proceedings of the 2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP), Poitiers, France.","DOI":"10.1109\/MMSP59012.2023.10337732"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Bie, M., Xu, H., Gao, Y., Song, K., and Che, X. (2024). Swin-FER: Swin Transformer for Facial Expression Recognition. Appl. Sci., 14.","DOI":"10.3390\/app14146125"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kim, J.H., Kim, N., and Won, C.S. (2022). Facial Expression Recognition with Swin Transformer. arXiv.","DOI":"10.3390\/s22103729"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Cambria, E., and Hussain, A. (2015). Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis, Springer International Publishing. Socio-Affective Computing.","DOI":"10.1007\/978-3-319-23654-4"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"32297","DOI":"10.1109\/ACCESS.2019.2901521","article-title":"Learning Affective Video Features for Facial Expression Recognition via Hybrid Deep Learning","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hasani, B., and Mahoor, M.H. (2017, January 1\u20133). Facial expression recognition using enhanced deep 3d convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Bangalore, India.","DOI":"10.1109\/CVPRW.2017.282"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"5007","DOI":"10.3934\/mbe.2024221","article-title":"Micro-expression recognition based on multi-scale 3D residual convolutional neural network","volume":"21","author":"Jin","year":"2024","journal-title":"Math. Biosci. Eng."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Aouayeb, M., Soladi\u00e9, C., Hamidouche, W., Kpalma, K., and S\u00e9guier, R. (2022). Spatiotemporal features fusion from local facial regions for micro-expressions recognition. Front. Signal Process., 2.","DOI":"10.3389\/frsip.2022.861469"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wang, H., Li, B., Wu, S., Shen, S., Liu, F., Ding, S., and Zhou, A. (2023, January 18\u201322). Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition. Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01722"},{"key":"ref_29","unstructured":"Bertasius, G., Wang, H., and Torresani, L. (2021, January 18\u201324). Is space-time attention all you need for video understanding?. Proceedings of the International Conference on Machine Learning (ICML), Virtual."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1109\/TAFFC.2024.3436913","article-title":"SVFAP: Self-Supervised Video Facial Affect Perceiver","volume":"16","author":"Sun","year":"2025","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Rafiei Oskooei, A., Akta\u015f, M.S., and Kele\u015f, M. (2024). Seeing the Sound: Multilingual Lip Sync for Real-Time Face-to-Face Translation. Computers, 14.","DOI":"10.3390\/computers14010007"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Rafiei Oskooei, A., Yahsi, E., Sungur, M., and Aktas, M.S. (2024, January 1\u20134). Can One Model Fit All? An Exploration of Wav2Lip\u2019s Lip-Syncing Generalizability Across Culturally Distinct Languages. Proceedings of the International Conference on Computational Science and Its Applications, Hanoi, Vietnam.","DOI":"10.1007\/978-3-031-65282-0_10"},{"key":"ref_33","unstructured":"Verma, A., Goyal, A., and Kaur, D. (2019). Fatigue Detection. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Shang, Y., Yang, M., Cui, J., Cui, L., Huang, Z., and Li, X. (2023). Driver emotion and fatigue state detection based on time series fusion. Electronics, 12.","DOI":"10.3390\/electronics12010026"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"175136","DOI":"10.1109\/ACCESS.2025.3612325","article-title":"A Comprehensive Face Parsing Framework for Anxiety Detection Using Deep Learning","volume":"13","author":"Panickar","year":"2025","journal-title":"IEEE Access"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Seo, H., Kim, S., and Lee, E.C. (2025). Defining and Analyzing Nervousness Using AI-Based Facial Expression Recognition. Mathematics, 13.","DOI":"10.3390\/math13111745"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Bruin, J., Stuldreher, I.V., Perone, P., van der Veen, F.M., van der Wee, N.J., Giltay, E.J., van der Mast, C.A., Neerincx, M.A., and van der Heiden, C. (2024). Detection of arousal and valence from facial expressions and physiological responses evoked by different types of stressors. Front. Neuroergonomics, 5.","DOI":"10.3389\/fnrgo.2024.1338243"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Krejtz, K., Duchowski, A.T., Niedzielska, A., Biele, C., and Krejtz, I. (2018). Eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0203629"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Dell\u2019Acqua, P., Garofalo, M., La Rosa, F., and Villari, M. (2025). Your Eyes Under Pressure: Real-Time Estimation of Cognitive Load with Smooth Pursuit Tracking. Big Data Cogn. Comput., 9.","DOI":"10.3390\/bdcc9110288"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1186\/s40101-015-0063-5","article-title":"Analysis of physiological signals for recognition of boredom, pain, and surprise emotions","volume":"34","author":"Jang","year":"2015","journal-title":"J. Physiol. Anthropol."},{"key":"ref_41","unstructured":"Puelke, D. (2024). Boredom Recognition in Manufacturing Tasks Using Physiological Signals. [Ph.D. Thesis, TU Wien]."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Yuvaraj, R., Samyuktha, S., Fogarty, J., Huang, J.S., Tan, S., and Kiong, W.T. (2025). Automated Boredom Recognition Using Multimodal Physiological Signals. IEEE Trans. Affect. Comput., early access.","DOI":"10.1109\/TAFFC.2025.3619979"},{"key":"ref_43","first-page":"22","article-title":"Automatic Recognition of Facial Actions in Spontaneous Expressions","volume":"1","author":"Littlewort","year":"2006","journal-title":"J. Multimed."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Ekman, P., and Friesen, W.V. (1978). Facial Action Coding System: A Technique for the Measurement of Facial Movement, Consulting Psychologists Press.","DOI":"10.1037\/t27734-000"},{"key":"ref_45","unstructured":"Noldus Information Technology (2025, October 24). FaceReader: Action Unit Module. Available online: https:\/\/www.noldus.com\/facereader."},{"key":"ref_46","unstructured":"BIOPAC Systems, Inc (2025, October 24). Facial Action Units. Available online: https:\/\/www.biopac.com\/facial-action-units\/."},{"key":"ref_47","unstructured":"Emotiva (2025, October 24). Action Units. Available online: https:\/\/emotiva.it\/en\/action-units\/."},{"key":"ref_48","unstructured":"Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., and Dimitrova, V. (2023). Affective Dynamic Based Technique for Facial Emotion Recognition (FER) to Support Intelligent Tutors in Education. Artificial Intelligence in Education, Proceedings of the 24th International Conference, AIED 2023, Tokyo, Japan, 3\u20137 July 2023, Springer."},{"key":"ref_49","first-page":"3243","article-title":"Continuous Facial Emotion Recognition Method Based on Deep Learning of Academic Emotions","volume":"32","author":"Lin","year":"2020","journal-title":"Sens. Mater."},{"key":"ref_50","unstructured":"Hosseini, M.M., Kolahdouzi, F., and Gholipour, A. (2025). Faces of Fairness: Examining Bias in Facial Expression Recognition Datasets and Models. arXiv."},{"key":"ref_51","unstructured":"Raina, R., Monares, M., Xu, M., Fabi, S., Xu, X., Li, L., Sumerfield, W., Gan, J., and de Sa, V.R. (2022, January 2). Exploring biases in facial expression analysis using synthetic faces. Proceedings of the NeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, New Orleans, LA, USA."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Zeng, J., Shan, S., and Chen, X. (2018, January 8\u201314). Facial expression recognition with inconsistently annotated datasets. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_14"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Wang, Z., Li, S., and Wang, Z. (2024). A New Joint Training Method for Facial Expression Recognition with Inconsistently Annotated and Imbalanced Data. Electronics, 13.","DOI":"10.3390\/electronics13193891"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Aydin, G., Aktas, M.S., Fox, G., Gadgil, H., Pierce, M., and Sayar, A. (2005). SERVOGrid Complexity Computational Environments: Integrated Performance Analysis. Proceedings of the 6th International Workshop on Grid Computing (GRID 2005), IEEE.","DOI":"10.1109\/GRID.2005.1542750"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"1717","DOI":"10.1002\/cpe.1199","article-title":"VLab: Collaborative Grid Services and Portals to Support Computational Material Science","volume":"19","author":"Nacar","year":"2007","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"2281","DOI":"10.1007\/s00024-006-0137-8","article-title":"Implementing the International Solid Earth Research Virtual Observatory by Integrating Computational Grid and Geographical Information Web Services","volume":"163","author":"Aktas","year":"2006","journal-title":"Pure Appl. Geophys."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1007\/978-3-319-09156-3_35","article-title":"On the Structural Code Clone Detection Problem: A Survey and Software Metric Based Approach","volume":"Volume 8583","author":"Kapdan","year":"2014","journal-title":"Computational Science and Its Applications\u2014ICCSA 2014; LNCS"},{"key":"ref_58","unstructured":"Roy, A.K. (2024, November 06). FERPlus Dataset. Available online: https:\/\/www.kaggle.com\/datasets\/arnabkumarroy02\/ferplus."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1037\/0022-3514.81.1.146","article-title":"Fear, anger, and risk","volume":"81","author":"Lerner","year":"2001","journal-title":"J. Personal. Soc. Psychol."},{"key":"ref_60","unstructured":"Sapolsky, R.M. (2004). Why Zebras Don\u2019t Get Ulcers: The Acclaimed Guide to Stress, Stress-Related Diseases, and Coping, Henry Holt and Company. Holt Paperbacks."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.bbi.2015.06.012","article-title":"Mind\u2013body therapies and control of inflammatory biology: A descriptive review","volume":"51","author":"Bower","year":"2016","journal-title":"Brain Behav. Immun."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Hockey, R. (2013). The Psychology of Fatigue: WORK, Effort, and Control, Cambridge University Press.","DOI":"10.1017\/CBO9781139015394"},{"key":"ref_63","unstructured":"Panksepp, J. (2004). Affective Neuroscience: The Foundations of Human and Animal Emotions, Oxford University Press."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"1367","DOI":"10.1098\/rstb.2004.1512","article-title":"The broaden-and-build theory of positive emotions","volume":"359","author":"Fredrickson","year":"2004","journal-title":"Philos. Trans. R. Soc. B Biol. Sci."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/15\/2\/77\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T10:16:43Z","timestamp":1770113803000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/15\/2\/77"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,1]]},"references-count":64,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["computers15020077"],"URL":"https:\/\/doi.org\/10.3390\/computers15020077","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,1]]}}}