{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T11:04:58Z","timestamp":1778497498046,"version":"3.51.4"},"reference-count":77,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2022,8,1]],"date-time":"2022-08-01T00:00:00Z","timestamp":1659312000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100010418","name":"Korea government (MSIT)","doi-asserted-by":"publisher","award":["2021-0-00087"],"award-info":[{"award-number":["2021-0-00087"]}],"id":[{"id":"10.13039\/501100010418","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Despite advanced machine learning methods, the implementation of emotion recognition systems based on real-world video content remains challenging. Videos may contain data such as images, audio, and text. However, the application of multimodal models using two or more types of data to real-world video media (CCTV, illegally filmed content, etc.) lacking sound or subtitles is difficult. Although facial expressions in image sequences can be utilized in emotion recognition, the diverse identities of individuals in real-world content limits computational models of relationships between facial expressions. This study proposed a transformation model which employed a video vision transformer to focus on facial expression sequences in videos. It effectively understood and extracted facial expression information from the identities of individuals, instead of fusing multimodal models. The design entailed capture of higher-quality facial expression information through mixed-token embedding facial expression sequences augmented via various methods into a single data representation, and comprised two modules: spatial and temporal encoders. Further, temporal position embedding, focusing on relationships between video frames, was proposed and subsequently applied to the temporal encoder module. The performance of the proposed algorithm was compared with that of conventional methods on two emotion recognition datasets of video content, with results demonstrating its superiority.<\/jats:p>","DOI":"10.3390\/s22155753","type":"journal-article","created":{"date-parts":[[2022,8,1]],"date-time":"2022-08-01T23:49:27Z","timestamp":1659397767000},"page":"5753","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["SMaTE: A Segment-Level Feature Mixing and Temporal Encoding Framework for Facial Expression Recognition"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1617-303X","authenticated-orcid":false,"given":"Nayeon","family":"Kim","sequence":"first","affiliation":[{"name":"Communication and Media Engineering, University of Science and Technology, 217, Gajeong-ro, Yuseong-gu, Daejeon 34113, Korea"}]},{"given":"Sukhee","family":"Cho","sequence":"additional","affiliation":[{"name":"Electronics and Telecommunications Research Institute, 218, Gajeong-ro, Yuseong-gu, Daejeon 34129, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0872-325X","authenticated-orcid":false,"given":"Byungjun","family":"Bae","sequence":"additional","affiliation":[{"name":"Communication and Media Engineering, University of Science and Technology, 217, Gajeong-ro, Yuseong-gu, Daejeon 34113, Korea"},{"name":"Electronics and Telecommunications Research Institute, 218, Gajeong-ro, Yuseong-gu, Daejeon 34129, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,1]]},"reference":[{"key":"ref_1","first-page":"16","article-title":"Basic emotions","volume":"98","author":"Ekman","year":"1999","journal-title":"Handb. Cogn. Emot."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1016\/S1077-3142(03)00081-X","article-title":"Facial expression recognition from video sequences: Temporal and static modeling","volume":"91","author":"Cohen","year":"2003","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_3","unstructured":"Chibelushi, C.C., and Bourel, F. (2022, May 02). Facial Expression Recognition: A Brief Tutorial Overview. CVonline: On-Line Compendium of Computer Vision. Available online: https:\/\/s2.smu.edu\/~mhd\/8331f06\/CCC.pdf."},{"key":"ref_4","unstructured":"Den Uyl, M., and Van Kuilenburg, H. (September, January 30). The FaceReader: Online facial expression recognition. Proceedings of the Measuring Behavior, Wageningen, The Netherlands."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Liu, S.S., Tian, Y.T., and Li, D. (2009, January 12\u201315). New research advances of facial expression recognition. Proceedings of the 2009 International Conference on Machine Learning and Cybernetics, Baoding, China.","DOI":"10.1109\/ICMLC.2009.5212409"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"803","DOI":"10.1016\/j.imavis.2008.08.005","article-title":"Facial expression recognition based on local binary patterns: A comprehensive study","volume":"27","author":"Shan","year":"2009","journal-title":"Image Vis. Comput."},{"key":"ref_7","first-page":"1552","article-title":"Facial expression recognition","volume":"2","author":"Sarode","year":"2010","journal-title":"Int. J. Comput. Sci. Eng."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Insaf, A., Ouahabi, A., Benzaoui, A., and Taleb Ahmed, A. (2020). Past, Present, and Future of Face Recognition: A Review. Electronics, 9.","DOI":"10.3390\/electronics9081188"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Schuller, B., Rigoll, G., and Lang, M. (2003, January 6\u20139). Hidden Markov model-based speech emotion recognition. Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Baltimore, MD, USA.","DOI":"10.1109\/ICME.2003.1220939"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1016\/S0167-6393(03)00099-2","article-title":"Speech emotion recognition using hidden Markov models","volume":"41","author":"Nwe","year":"2003","journal-title":"Speech Commun."},{"key":"ref_11","unstructured":"Schuller, B., Rigoll, G., and Lang, M. (2004, January 17\u201321). Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada."},{"key":"ref_12","unstructured":"Lin, Y.L., and Wei, G. (2005, January 18\u201321). Speech emotion recognition based on HMM and SVM. Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hu, H., Xu, M.X., and Wu, W. (2007, January 15\u201320). GMM supervector based SVM with spectral features for speech emotion recognition. Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP\u201907, Honolulu, HI, USA.","DOI":"10.1109\/ICASSP.2007.366937"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Chatfield, K., Simonyan, K., Vedaldi, A., and Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. arXiv.","DOI":"10.5244\/C.28.6"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A.A. (2017, January 4\u20139). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_19","unstructured":"Zaremba, W., Sutskever, I., and Vinyals, O. (2014). Recurrent neural network regularization. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_21","unstructured":"Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TAFFC.2014.2386334","article-title":"Automatic facial expression recognition using features of salient facial patches","volume":"6","author":"Happy","year":"2014","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Jung, H., Lee, S., Yim, J., Park, S., and Kim, J. (2015, January 7\u201313). Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.341"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Yu, Z., and Zhang, C. (2015, January 9\u201313). Image based static facial expression recognition with multiple deep network learning. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA.","DOI":"10.1145\/2818346.2830595"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"609673","DOI":"10.3389\/frai.2020.609673","article-title":"Emotionnet nano: An efficient deep convolutional neural network design for real-time facial expression recognition","volume":"3","author":"Lee","year":"2021","journal-title":"Front. Artif. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"El Morabit, S., Rivenq, A., Zighem, M.E.n., Hadid, A., Ouahabi, A., and Taleb-Ahmed, A. (2021). Automatic Pain Estimation from Facial Expressions: A Comparative Analysis Using Off-the-Shelf CNN Architectures. Electronics, 10.","DOI":"10.3390\/electronics10161926"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"117327","DOI":"10.1109\/ACCESS.2019.2936124","article-title":"Speech emotion recognition using deep learning techniques: A review","volume":"7","author":"Khalil","year":"2019","journal-title":"IEEE Access"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1016\/j.bspc.2018.08.035","article-title":"Speech emotion recognition using deep 1D & 2D CNN LSTM networks","volume":"47","author":"Zhao","year":"2019","journal-title":"Biomed. Signal Process. Control"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1016\/j.imavis.2011.07.002","article-title":"Facial expression recognition from near-infrared videos","volume":"29","author":"Zhao","year":"2011","journal-title":"Image Vis. Comput."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1109\/TAFFC.2016.2593719","article-title":"Facial expression recognition in video with multiple feature fusion","volume":"9","author":"Chen","year":"2016","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sun, M., Li, J., Feng, H., Gou, W., Shen, H., Tang, J., Yang, Y., and Ye, J. (2020, January 25\u201329). Multi-Modal Fusion Using Spatio-Temporal and Static Features for Group Emotion Recognition. Proceedings of the 2020 International Conference on Multimodal Interaction, Virtual Event.","DOI":"10.1145\/3382507.3417971"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Dhall, A., Sharma, G., Goecke, R., and Gedeon, T. (2020, January 25\u201329). Emotiw 2020: Driver gaze, group emotion, student engagement and physiological signal based challenges. Proceedings of the 2020 International Conference on Multimodal Interaction, New York, NY, USA.","DOI":"10.1145\/3382507.3417973"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, C., Jiang, W., Wang, M., and Tang, T. (2020). Group Level Audio-Video Emotion Recognition Using Hybrid Networks. Group Level Audio-Video Emotion Recognition Using Hybrid Networks, Association for Computing Machinery.","DOI":"10.1145\/3382507.3417968"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Fan, Y., Lu, X., Li, D., and Liu, Y. (2016, January 12\u201316). Video-based emotion recognition using CNN-RNN and C3D hybrid networks. Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan.","DOI":"10.1145\/2993148.2997632"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Dhall, A., Goecke, R., Joshi, J., Hoey, J., and Gedeon, T. (2016, January 12\u201316). Emotiw 2016: Video and group-level emotion recognition challenges. Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan.","DOI":"10.1145\/2993148.2997638"},{"key":"ref_36","unstructured":"Dhall, A., Goecke, R., Lucey, S., and Gedeon, T. (2011). Acted Facial Expressions in the Wild Database, Australian National University. Technical Report TR-CS-11."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1109\/MMUL.2012.26","article-title":"Collecting large, richly annotated facial-expression databases from movies","volume":"19","author":"Dhall","year":"2012","journal-title":"IEEE Multimed."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bargal, S.A., Barsoum, E., Ferrer, C.C., and Zhang, C. (2016, January 12\u201316). Emotion recognition in the wild from videos using images. Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan.","DOI":"10.1145\/2993148.2997627"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Huang, Y., Chen, F., Lv, S., and Wang, X. (2019). Facial expression recognition: A survey. Symmetry, 11.","DOI":"10.3390\/sym11101189"},{"key":"ref_40","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_41","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30, Available online: https:\/\/dl.acm.org\/doi\/10.5555\/3295222.3295349."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lu\u010di\u0107, M., and Schmid, C. (2021, January 10\u201317). Vivit: A video vision transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00676"},{"key":"ref_43","unstructured":"Kanade, T., Cohn, J.F., and Tian, Y. (2000, January 28\u201330). Comprehensive database for facial expression analysis. Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580), Grenoble, France."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., and Matthews, I. (2010, January 13\u201318). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543262"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1272","DOI":"10.1002\/jemt.23686","article-title":"Computer vision for microscopic skin cancer diagnosis using handcrafted and non-handcrafted features","volume":"84","author":"Saba","year":"2021","journal-title":"Microsc. Res. Tech."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"58","DOI":"10.4018\/IJHISI.20210701.oa4","article-title":"Dermatoscopy using multi-layer perceptron, convolution neural network, and capsule network to differentiate malignant melanoma from benign nevus","volume":"16","author":"Tiwari","year":"2021","journal-title":"Int. J. Healthc. Inf. Syst. Inform. (IJHISI)"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1016\/j.neucom.2015.06.079","article-title":"A multi-task model for simultaneous face identification and facial expression recognition","volume":"171","author":"Zheng","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Rassadin, A., Gruzdev, A., and Savchenko, A. (2017, January 13\u201317). Group-level emotion recognition using transfer learning from face identification. Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK.","DOI":"10.1145\/3136755.3143007"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Vu, M.T., Beurton-Aimar, M., and Marchand, S. (2021, January 11\u201317). Multitask multi-database emotion recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00406"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Chang, X., and Skarbek, W. (June, January 25). From face identification to emotion recognition. Proceedings of the Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2019, International Society for Optics and Photonics, Wilga, Poland.","DOI":"10.1117\/12.2536735"},{"key":"ref_51","unstructured":"Lee, J., Kim, S., Kim, S., Park, J., and Sohn, K. (November, January 27). Context-Aware Emotion Recognition Networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Mittal, T., Guhan, P., Bhattacharya, U., Chandra, R., Bera, A., and Manocha, D. (2020, January 13\u201319). EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege\u2019s Principle. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01424"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1016\/j.neucom.2018.07.028","article-title":"Spatio-temporal convolutional features with nested LSTM for facial expression recognition","volume":"317","author":"Yu","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_54","unstructured":"Breuer, R., and Kimmel, R. (2017). A deep learning perspective on the origin of facial expressions. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Ko, B.C. (2018). A brief review of facial emotion recognition based on visual information. Sensors, 18.","DOI":"10.3390\/s18020401"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Fan, Y., Lam, J.C., and Li, V.O. (2018, January 4\u20137). Multi-region ensemble convolutional neural network for facial expression recognition. Proceedings of the International Conference on Artificial Neural Networks, Rhodes, Greece.","DOI":"10.1007\/978-3-030-01418-6_9"},{"key":"ref_57","unstructured":"Kayhan, O.S., and Gemert, J.C.V. (2020, January 13\u201319). On translation invariance in cnns: Convolutional layers can exploit absolute spatial location. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA."},{"key":"ref_58","unstructured":"Kondor, R., and Trivedi, S. (2018, January 10\u201315). On the generalization of equivariance and convolution in neural networks to the action of compact groups. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_59","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 25."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1037\/a0036052","article-title":"Perceptions of emotion from facial expressions are not culturally universal: Evidence from a remote culture","volume":"14","author":"Gendron","year":"2014","journal-title":"Emotion"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1080\/10417949109372824","article-title":"Cultural influences on facial expressions of emotion","volume":"56","author":"Matsumoto","year":"1991","journal-title":"South. J. Commun."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"795","DOI":"10.3389\/fpsyg.2020.00795","article-title":"Attentional Bias to Facial Expressions of Different Emotions\u2014A Cross-Cultural Comparison of \u2260Akhoe Hai||om and German Children and Adolescents","volume":"11","author":"Pritsch","year":"2020","journal-title":"Front. Psychol."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Savchenko, A.V. (2021, January 16\u201318). Facial expression and attributes recognition based on multi-task learning of lightweight neural networks. Proceedings of the 2021 IEEE 19th International Symposium on Intelligent Systems and Informatics (SISY), Subotica, Serbia.","DOI":"10.1109\/SISY52375.2021.9582508"},{"key":"ref_64","unstructured":"He, Y., Xu, D., Wu, L., Jian, M., Xiang, S., and Pan, C. (2019). Lffd: A light and fast face detector for edge devices. arXiv."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Yang, S., Luo, P., Loy, C.C., and Tang, X. (2016, January 27\u201330). Wider face: A face detection benchmark. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.596"},{"key":"ref_66","unstructured":"Chen, C. PyTorch Face Landmark: A Fast and Accurate Facial Landmark Detector. 2021."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-019-0197-0","article-title":"A survey on image data augmentation for deep learning","volume":"6","author":"Shorten","year":"2019","journal-title":"J. Big Data"},{"key":"ref_68","unstructured":"Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020, January 13\u201318). A simple framework for contrastive learning of visual representations. Proceedings of the International Conference on Machine Learning, Virtual Event."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Park, S.J., Kim, B.G., and Chilamkurti, N. (2021). A Robust Facial Expression Recognition Algorithm Based on Multi-Rate Feature Fusion Scheme. Sensors, 21.","DOI":"10.3390\/s21216954"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Ahonen, T., Hadid, A., and Pietik\u00e4inen, M. (2004, January 11\u201314). Face recognition with local binary patterns. Proceedings of the European Conference on Computer Vision, Prague, Czech Republic.","DOI":"10.1007\/978-3-540-24670-1_36"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.ins.2016.04.021","article-title":"Extended local binary patterns for face recognition","volume":"358\u2013359","author":"Liu","year":"2016","journal-title":"Inf. Sci."},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"061407","DOI":"10.1117\/1.JEI.25.6.061407","article-title":"Facial expression recognition in the wild based on multimodal texture features","volume":"25","author":"Sun","year":"2016","journal-title":"J. Electron. Imaging"},{"key":"ref_73","unstructured":"Hendrycks, D., and Gimpel, K. (2016). Gaussian error linear units (gelus). arXiv."},{"key":"ref_74","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. (2014, January 23\u201328). Large-scale video classification with convolutional neural networks. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.223"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Minaee, S., Minaei, M., and Abdolrashidi, A. (2021). Deep-emotion: Facial expression recognition using attentional convolutional network. Sensors, 21.","DOI":"10.3390\/s21093046"},{"key":"ref_77","unstructured":"Pourmirzaei, M., Montazer, G.A., and Esmaili, F. (2021). Using Self-Supervised Auxiliary Tasks to Improve Fine-Grained Facial Representation. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/15\/5753\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:00:54Z","timestamp":1760140854000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/15\/5753"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,1]]},"references-count":77,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["s22155753"],"URL":"https:\/\/doi.org\/10.3390\/s22155753","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,1]]}}}