{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T05:04:48Z","timestamp":1781154288105,"version":"3.54.1"},"reference-count":23,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,4,19]],"date-time":"2022-04-19T00:00:00Z","timestamp":1650326400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the Key Research Program of the Chinese Academy of Sciences","award":["Grant NO.ZDRW-ZS-2021-1"],"award-info":[{"award-number":["Grant NO.ZDRW-ZS-2021-1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Video-based dynamic facial emotion recognition (FER) is a challenging task, as one must capture and distinguish tiny facial movements representing emotional changes while ignoring the facial differences of different objects. Recent state-of-the-art studies have usually adopted more complex methods to solve this task, such as large-scale deep learning models or multimodal analysis with reference to multiple sub-models. According to the characteristics of the FER task and the shortcomings of existing methods, in this paper we propose a lightweight method and design three attention modules that can be flexibly inserted into the backbone network. The key information for the three dimensions of space, channel, and time is extracted by means of convolution layer, pooling layer, multi-layer perception (MLP), and other approaches, and attention weights are generated. By sharing parameters at the same level, the three modules do not add too many network parameters while enhancing the focus on specific areas of the face, effective feature information of static images, and key frames. The experimental results on CK+ and eNTERFACE\u201905 datasets show that this method can achieve higher accuracy.<\/jats:p>","DOI":"10.3390\/info13050207","type":"journal-article","created":{"date-parts":[[2022,4,19]],"date-time":"2022-04-19T22:07:26Z","timestamp":1650406046000},"page":"207","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Multi-Attention Module for Dynamic Facial Emotion Recognition"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4956-5451","authenticated-orcid":false,"given":"Junnan","family":"Zhi","sequence":"first","affiliation":[{"name":"Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China"},{"name":"School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing 100049, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tingting","family":"Song","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kang","family":"Yu","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fengen","family":"Yuan","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China"},{"name":"School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing 100049, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Huaqiang","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China"},{"name":"School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing 100049, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guangyang","family":"Hu","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China"},{"name":"School of Integrated Circuits, University of Chinese Academy of Sciences, Beijing 100049, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hao","family":"Yang","sequence":"additional","affiliation":[{"name":"Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,19]]},"reference":[{"key":"ref_1","first-page":"1","article-title":"A Systematic Review of Artificial Intelligence (AI) Based Approaches for the Diagnosis of Parkinson\u2019s Disease","volume":"1","author":"Saravanan","year":"2022","journal-title":"Arch. Comput. Methods Eng."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Jiang, Z., Seyedi, S., Haque, R.U., Pongos, A.L., Vickers, K.L., Manzanares, C.M., Lah, J.J., Levey, A.I., and Clifford, G.D. (2022). Automated analysis of facial emotions in subjects with cognitive impairment. PLoS ONE, 17.","DOI":"10.1371\/journal.pone.0262527"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1004","DOI":"10.1017\/S1355617714000939","article-title":"Facial and bodily emotion recognition in multiple sclerosis: The role of alexithymia and other characteristics of the disease","volume":"20","author":"Cecchetto","year":"2014","journal-title":"J. Int. Neuropsychol. Soc."},{"key":"ref_4","unstructured":"Shan, L., and Deng, W. (2022, March 01). Deep Facial Expression Recognition: A Survey. IEEE Transactions on Affective Computing. Available online: https:\/\/ieeexplore.ieee.org\/abstract\/document\/9039580."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ekman, R. (1997). What the Face RevealsBasic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS), Oxford University Press.","DOI":"10.1093\/oso\/9780195104462.001.0001"},{"key":"ref_6","unstructured":"Littlewort, G., Bartlett, M.S., Fasel, I., Susskind, J., and Movellan, J. (July, January 27). Dynamics of facial expression extracted automatically from video. Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop, Washington, DC, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"803","DOI":"10.1016\/j.imavis.2008.08.005","article-title":"Facial expression recognition based on Local Binary Patterns: A comprehensive study","volume":"27","author":"Shan","year":"2009","journal-title":"Image Vis. Comput."},{"key":"ref_8","unstructured":"Kahou, S.E., Michalski, V., Konda, K., Memisevic, R., and Pal, C. (2015, January 9). Recurrent Neural Networks for Emotion Recognition in Video. Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, New York, NY, USA."},{"key":"ref_9","first-page":"12","article-title":"Facial Expression Recognition Using 3D Convolutional Neural Network","volume":"5","author":"Byeon","year":"2014","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_10","unstructured":"Nakano, Y. (2016, January 12\u201316). Video-based emotion recognition using CNN-RNN and C3D hybrid networks. Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1109\/TAFFC.2017.2713783","article-title":"Audio-Visual Emotion Recognition in Video Clips","volume":"10","author":"Noroozi","year":"2017","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ma, F., Li, Y., Ni, S., Huang, S., and Zhang, L. (2022). Data Augmentation for Audio\u2013Visual Emotion Recognition with an Efficient Multimodal Conditional GAN. Appl. Sci., 12.","DOI":"10.3390\/app12010527"},{"key":"ref_13","unstructured":"Kanade, T., Tian, Y., and Cohn, J.F. (2002, January 28\u201330). Comprehensive database for facial expression analysis. Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, Grenoble, France."},{"key":"ref_14","first-page":"383","article-title":"The eNTERFACE\u201905 Audio-Visual Emotion Database, International Conference on Data Engineering Workshops","volume":"8","author":"Martin","year":"2006","journal-title":"IEEE Comput. Soc."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Meng, D., Peng, X., Wang, K., and Qiao, Y. (2019, January 22\u201325). Frame Attention Networks for Facial Expression Recognition in Videos. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803603"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sepas-Moghaddam, A., Etemad, A., Pereira, F., and Correia, L.P. (2020, January 4\u20138). Facial emotion recognition using light field images with deep attention-based bidirectional LSTM. Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053919"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Aminbeidokhti, M., Pedersoli, M., Cardinal, P., and Granger, E. (2019). Emotion recognition with spatial attention and temporal softmax pooling. International Conference on Image Analysis and Recognition, Springer.","DOI":"10.1007\/978-3-030-27202-9_29"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1109\/LSP.2021.3063609","article-title":"A two-stage spatiotemporal attention convolution network for continuous dimensional emotion recognition from facial video","volume":"28","author":"Hu","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, Y., Wu, J., and Hoashi, K. (2019, January 14). Multi-attention fusion network for video-based emotion recognition. Proceedings of the 2019 International Conference on Multimodal Interaction, Association for Computing Machinery, New York, NY, USA.","DOI":"10.1145\/3340555.3355720"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_21","unstructured":"Lin, M., Chen, Q., and Yan, S. (2013). Network in network. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018). CBAM: Convolutional block attention module. arXiv.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_23","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/5\/207\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:56:31Z","timestamp":1760136991000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/5\/207"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,19]]},"references-count":23,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["info13050207"],"URL":"https:\/\/doi.org\/10.3390\/info13050207","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,19]]}}}