{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,22]],"date-time":"2026-02-22T04:18:44Z","timestamp":1771733924034,"version":"3.50.1"},"reference-count":60,"publisher":"Springer Science and Business Media LLC","issue":"24","license":[{"start":{"date-parts":[[2021,12,13]],"date-time":"2021-12-13T00:00:00Z","timestamp":1639353600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,12,13]],"date-time":"2021-12-13T00:00:00Z","timestamp":1639353600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100014013","name":"UKRI","doi-asserted-by":"crossref","award":["UKV929394"],"award-info":[{"award-number":["UKV929394"]}],"id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100000836","name":"University of Liverpool","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000836","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Human emotion recognition is an active research area in artificial intelligence and has made substantial progress over the past few years. Many recent works mainly focus on facial regions to infer human affection, while the surrounding context information is not effectively utilized. In this paper, we proposed a new deep network to effectively recognize human emotions using a novel global-local attention mechanism. Our network is designed to extract features from both facial and context regions independently, then learn them together using the attention module. In this way, both the facial and contextual information is used to infer human emotions, therefore enhancing the discrimination of the classifier. The intensive experiments show that our method surpasses the current state-of-the-art methods on recent emotion datasets by a fair margin. Qualitatively, our global-local attention module can extract more meaningful attention maps than previous methods. The source code and trained model of our network are available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/minhnhatvt\/glamor-net\">https:\/\/github.com\/minhnhatvt\/glamor-net<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s00521-021-06778-x","type":"journal-article","created":{"date-parts":[[2021,12,13]],"date-time":"2021-12-13T13:03:01Z","timestamp":1639400581000},"page":"21625-21639","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["Global-local attention for emotion recognition"],"prefix":"10.1007","volume":"34","author":[{"given":"Nhat","family":"Le","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Khanh","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1449-211X","authenticated-orcid":false,"given":"Anh","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bac","family":"Le","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,12,13]]},"reference":[{"key":"6778_CR1","unstructured":"Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on operating systems design and implementation, OSDI\u201916, p. 265\u2013283"},{"key":"6778_CR2","unstructured":"Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds.) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, conference track proceedings. http:\/\/arxiv.org\/abs\/1409.0473"},{"key":"6778_CR3","doi-asserted-by":"crossref","unstructured":"Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on image and video retrieval, CIVR \u201907. Association for computing machinery, New York, NY, USA, p. 401\u2013408. https:\/\/doi.org\/10.1145\/1282280.1282340","DOI":"10.1145\/1282280.1282340"},{"key":"6778_CR4","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1007\/978-3-540-85099-1_8","volume-title":"Affect and emotion in human-computer interaction, from theory to applications, lecture notes in computer science","author":"G Castellano","year":"2008","unstructured":"Castellano G, Kessous L, Caridakis G (2008) Emotion recognition through multiple modalities: face, body gesture, speech. In: Peter C, Beale R (eds) Affect and emotion in human-computer interaction, from theory to applications, lecture notes in computer science. Springer, New York, pp 92\u2013103"},{"key":"6778_CR5","first-page":"884","volume-title":"International workshops on electrical and computer engineering subfields","author":"J Chen","year":"2014","unstructured":"Chen J, Chen Z, Chi Z, Fu H et al (2014) Facial expression recognition based on facial components detection and hog features. In: International workshops on electrical and computer engineering subfields, pp 884\u2013888"},{"key":"6778_CR6","doi-asserted-by":"crossref","unstructured":"Chen Y, Wang J, Chen S, Shi Z, Cai J (2019) Facial motion prior networks for facial expression recognition. In: 2019 IEEE visual communications and image processing, VCIP 2019, Sydney, Australia, December 1\u20134, 2019 IEEE, pp. 1\u20134. https:\/\/doi.org\/10.1109\/VCIP47243.2019.8965826","DOI":"10.1109\/VCIP47243.2019.8965826"},{"key":"6778_CR7","unstructured":"Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds.) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, December 7\u201312, 2015, Montreal, Quebec, Canada, pp. 577\u2013585. http:\/\/papers.nips.cc\/paper\/5847-attention-based-models-for-speech-recognition"},{"key":"6778_CR8","doi-asserted-by":"publisher","first-page":"920","DOI":"10.3389\/fpsyg.2020.00920","volume":"11","author":"EA Clark","year":"2020","unstructured":"Clark EA, Kessinger J, Duncan SE, Bell MA, Lahne J, Gallagher DL, O\u2019Keefe SF (2020) The facial action coding system for characterization of human affective response to consumer product-based stimuli: asystematic review. Front Psychol 11:920","journal-title":"Front Psychol"},{"key":"6778_CR9","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1016\/j.specom.2008.03.012","volume":"50","author":"C Clavel","year":"2008","unstructured":"Clavel C, Vasilescu I, Devillers L, Richard G, Ehrette T (2008) Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun 50:487\u2013503. https:\/\/doi.org\/10.1016\/j.specom.2008.03.012","journal-title":"Speech Commun"},{"issue":"8","key":"6778_CR10","doi-asserted-by":"publisher","first-page":"1548","DOI":"10.1109\/TPAMI.2016.2515606","volume":"38","author":"CA Corneanu","year":"2016","unstructured":"Corneanu CA, Sim\u00f3n MO, Cohn JF, Guerrero SE (2016) Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications. IEEE Trans Pattern Anal Mach Intell 38(8):1548\u20131568","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"6778_CR11","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1109\/79.911197","volume":"18","author":"R Cowie","year":"2001","unstructured":"Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human-computer interaction. Signal Process Mag IEEE 18:32\u201380. https:\/\/doi.org\/10.1109\/79.911197","journal-title":"Signal Process Mag IEEE"},{"key":"6778_CR12","doi-asserted-by":"crossref","unstructured":"Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR\u201905) - Volume 1 - Volume 01, CVPR \u201905. IEEE Computer Society, USA, p. 886-893. https:\/\/doi.org\/10.1109\/CVPR.2005.177","DOI":"10.1109\/CVPR.2005.177"},{"issue":"3","key":"6778_CR13","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1109\/MMUL.2012.26","volume":"19","author":"A Dhall","year":"2012","unstructured":"Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMedia 19(3):34\u201341. https:\/\/doi.org\/10.1109\/MMUL.2012.26","journal-title":"IEEE MultiMedia"},{"issue":"7","key":"6778_CR14","doi-asserted-by":"publisher","first-page":"1895","DOI":"10.1162\/089976698300017197","volume":"10","author":"TG Dietterich","year":"1998","unstructured":"Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895\u20131923. https:\/\/doi.org\/10.1162\/089976698300017197","journal-title":"Neural Comput"},{"key":"6778_CR15","doi-asserted-by":"crossref","unstructured":"Do T, Nguyen BX, Tjiputra E, Tran M, Tran QD, Nguyen A (2021) Multiple meta-model quantifying for medical visual question answering. arXiv preprint arXiv:2105.08913","DOI":"10.1007\/978-3-030-87240-3_7"},{"key":"6778_CR16","doi-asserted-by":"crossref","unstructured":"Do TT, Nguyen A, Reid I (2018) Affordancenet: an end-to-end deep learning approach for object affordance detection. In: 2018 IEEE international conference on robotics and automation (ICRA), IEEE. pp. 5882\u20135889","DOI":"10.1109\/ICRA.2018.8460902"},{"issue":"11","key":"6778_CR17","doi-asserted-by":"publisher","first-page":"7539","DOI":"10.1007\/s00521-019-04279-6","volume":"32","author":"SR Dubey","year":"2020","unstructured":"Dubey SR, Roy SK, Chakraborty S, Mukherjee S, Chaudhuri BB (2020) Local bit-plane decoded convolutional neural network features for biomedical image retrieval. Neural Comput Appl 32(11):7539\u20137551","journal-title":"Neural Comput Appl"},{"issue":"2","key":"6778_CR18","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1037\/h0030377","volume":"17","author":"P Ekman","year":"1971","unstructured":"Ekman P, Friesen W (1971) Constants across cultures in the face and emotion. J Personal Soc Psychol 17(2):124\u2013129","journal-title":"J Personal Soc Psychol"},{"issue":"3","key":"6778_CR19","doi-asserted-by":"publisher","first-page":"572","DOI":"10.1016\/j.patcog.2010.09.020","volume":"44","author":"M El Ayadi","year":"2011","unstructured":"El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44(3):572\u2013587","journal-title":"Pattern Recognit"},{"key":"6778_CR20","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1007\/3-540-44673-7_12","volume-title":"Machine learning and its applications","author":"T Evgeniou","year":"2001","unstructured":"Evgeniou T, Pontil M (2001) Support vector machines: theory and applications. Machine learning and its applications. Springer, Berlin Heidelberg, pp. 249\u2013257"},{"key":"6778_CR21","doi-asserted-by":"crossref","unstructured":"Galassi A, Lippi M, Torroni P (2020) Attention in natural language processing. IEEE Trans Neural Netw Learn Syst","DOI":"10.1109\/TNNLS.2020.3019893"},{"key":"6778_CR22","doi-asserted-by":"publisher","first-page":"6488","DOI":"10.1109\/ACCESS.2020.3048693","volume":"9","author":"Q Gao","year":"2021","unstructured":"Gao Q, Zeng H, Li G, Tong T (2021) Graph reasoning-based emotion recognition network. IEEE Access 9:6488\u20136497. https:\/\/doi.org\/10.1109\/ACCESS.2020.3048693","journal-title":"IEEE Access"},{"key":"6778_CR23","doi-asserted-by":"publisher","first-page":"64827","DOI":"10.1109\/ACCESS.2019.2917266","volume":"7","author":"M Georgescu","year":"2019","unstructured":"Georgescu M, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827\u201364836. https:\/\/doi.org\/10.1109\/ACCESS.2019.2917266","journal-title":"IEEE Access"},{"key":"6778_CR24","doi-asserted-by":"crossref","unstructured":"Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech 2014","DOI":"10.21437\/Interspeech.2014-57"},{"key":"6778_CR25","first-page":"770","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"K He","year":"2016","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778"},{"key":"6778_CR26","first-page":"217","volume-title":"Emotion recognition using voice based on emotion-sensitive frequency ranges","author":"KH Hyun","year":"2007","unstructured":"Hyun KH, Kim EH, Kwak YK (2007) Emotion recognition using voice based on emotion-sensitive frequency ranges. Springer, Berlin, Heidelberg, pp 217\u2013223"},{"issue":"1","key":"6778_CR27","doi-asserted-by":"publisher","first-page":"140","DOI":"10.1214\/088342304000000026","volume":"19","author":"MI Jordan","year":"2004","unstructured":"Jordan MI (2004) Graphical models. Stat Sci 19(1):140\u2013155","journal-title":"Stat Sci"},{"key":"6778_CR28","first-page":"1755","volume":"10","author":"D King","year":"2009","unstructured":"King D (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755\u20131758","journal-title":"J Mach Learn Res"},{"issue":"11","key":"6778_CR29","first-page":"2755","volume":"42","author":"R Kosti","year":"2019","unstructured":"Kosti R, Alvarez JM, Recasens A, Lapedriza A (2019) Context based emotion recognition using emotic dataset. IEEE Trans Pattern Anal Mach Intell 42(11):2755\u20132766","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"6778_CR30","first-page":"1097","volume-title":"Advances in neural information processing systems","author":"A Krizhevsky","year":"2012","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates Inc, Red Hook, pp 1097\u20131105"},{"key":"6778_CR31","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems - Volume 1, NIPS\u201912. Curran Associates Inc., Red Hook, NY, USA, pp. 1097\u20131105"},{"key":"6778_CR32","doi-asserted-by":"publisher","unstructured":"Lee J, Kim S, Kim S, Park J, Sohn K (2019) Context-aware emotion recognition networks. In: 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), pp. 10142\u201310151. https:\/\/doi.org\/10.1109\/ICCV.2019.01024","DOI":"10.1109\/ICCV.2019.01024"},{"issue":"6","key":"6778_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3363574","volume":"13","author":"JB Lee","year":"2019","unstructured":"Lee JB, Rossi RA, Kim S, Ahmed NK, Koh E (2019) Attention models in graphs: a survey. ACM Trans Knowl Discov Data 13(6):1\u201325","journal-title":"ACM Trans Knowl Discov Data"},{"key":"6778_CR34","doi-asserted-by":"crossref","unstructured":"Li S, Deng W (2020) Deep facial expression recognition: asurvey. IEEE transactions on affective computing p. 1\u20131. http:\/\/dx.doi.org\/10.1109\/TAFFC.2020.2981446","DOI":"10.1109\/TAFFC.2020.2981446"},{"key":"6778_CR35","doi-asserted-by":"crossref","unstructured":"Liu X, Kumar BVKV, You J, Jia P (2017) Adaptive deep metric learning for identity-aware facial expression recognition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 522\u2013531","DOI":"10.1109\/CVPRW.2017.79"},{"key":"6778_CR36","doi-asserted-by":"publisher","unstructured":"Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition -workshops, pp. 94\u2013101. https:\/\/doi.org\/10.1109\/CVPRW.2010.5543262","DOI":"10.1109\/CVPRW.2010.5543262"},{"key":"6778_CR37","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1007\/BF00992972","volume":"16","author":"D Matsumoto","year":"1992","unstructured":"Matsumoto D (1992) More evidence for the universality of a contempt expression. Motiv Emot 16:363\u2013368","journal-title":"Motiv Emot"},{"key":"6778_CR38","doi-asserted-by":"publisher","unstructured":"Meng D, Peng X, Wang K, Qiao Y (2019) Frame attention networks for facial expression recognition in videos. In: 2019 IEEE international conference on image processing (ICIP), pp. 3866\u20133870. https:\/\/doi.org\/10.1109\/ICIP.2019.8803603","DOI":"10.1109\/ICIP.2019.8803603"},{"key":"6778_CR39","doi-asserted-by":"publisher","unstructured":"Mittal T, Guhan P, Bhattacharya U, Chandra R, Bera A, Manocha D (2020) Emoticon: context-aware multimodal emotion recognition using frege\u2019s principle. In: 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14222\u201314231. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01424","DOI":"10.1109\/CVPR42600.2020.01424"},{"issue":"1","key":"6778_CR40","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1109\/TAFFC.2017.2740923","volume":"10","author":"A Mollahosseini","year":"2019","unstructured":"Mollahosseini A, Hasani B, Mahoor MH (2019) Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans Affect Comput 10(1):18\u201331. https:\/\/doi.org\/10.1109\/TAFFC.2017.2740923","journal-title":"IEEE Trans Affect Comput"},{"key":"6778_CR41","doi-asserted-by":"crossref","unstructured":"Nguyen A, Do TT, Reid I, Caldwell DG, Tsagarakis NG (2019) V2cnet: a deep learning framework to translate videos to commands for robotic manipulation. arXiv preprint arXiv:1903.10869","DOI":"10.1109\/ICRA.2018.8460857"},{"key":"6778_CR42","doi-asserted-by":"crossref","unstructured":"Nguyen A, Nguyen N, Tran K, Tjiputra E, Tran QD (2020) Autonomous navigation in complex environments with deep multimodal fusion network. In: 2020 IEEE\/RSJ international conference on intelligent robots and systems (IROS), IEEE. pp. 5824\u20135830","DOI":"10.1109\/IROS45743.2020.9341494"},{"key":"6778_CR43","doi-asserted-by":"crossref","unstructured":"Nguyen BX, Nguyen BD, Do T, Tjiputra E, Tran QD, Nguyen A (2020) Graph-based person signature for person re-identifications. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition workshop, pp. 3492\u20133501","DOI":"10.1109\/CVPRW53098.2021.00388"},{"key":"6778_CR44","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1016\/S0079-6123(06)55002-2","volume":"155","author":"A Oliva","year":"2006","unstructured":"Oliva A, Torralba A (2006) Building the gist of a scene: the role of global image features in recognition. Prog Brain Res 155:23\u201336","journal-title":"Prog Brain Res"},{"key":"6778_CR45","doi-asserted-by":"publisher","first-page":"360","DOI":"10.1007\/978-3-642-21227-7_34","volume-title":"Image analysis","author":"J P\u00e4iv\u00e4rinta","year":"2011","unstructured":"P\u00e4iv\u00e4rinta J, Rahtu E, Heikkil\u00e4 J (2011) Volume local phase quantization for blur-insensitive dynamic texture classification. In: Heyden A, Kahl F (eds) Image analysis. Springer, Berlin, Heidelberg, pp 360\u2013369"},{"key":"6778_CR46","doi-asserted-by":"publisher","first-page":"345","DOI":"10.3389\/fpsyg.2013.00345","volume":"4","author":"S Paulmann","year":"2013","unstructured":"Paulmann S, Bleichner M, Kotz SA (2013) Valence, arousal, and task effects in emotional prosody processing. Front Psychol 4:345","journal-title":"Front Psychol"},{"key":"6778_CR47","unstructured":"Randhavane T, Bhattacharya U, Kapsaskis K, Gray K, Bera A, Manocha D (2020) Identifying emotions from walking using affective and deep features. arXiv preprint arXiv:1906.11884"},{"key":"6778_CR48","first-page":"4510","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"M Sandler","year":"2018","unstructured":"Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510\u20134520"},{"issue":"6","key":"6778_CR49","doi-asserted-by":"publisher","first-page":"1113","DOI":"10.1109\/TPAMI.2014.2366127","volume":"37","author":"E Sariyanidi","year":"2015","unstructured":"Sariyanidi E, Gunes H, Cavallaro A (2015) Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Trans Pattern Anal Mach Intell 37(6):1113\u20131133","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"9","key":"6778_CR50","doi-asserted-by":"publisher","first-page":"1238","DOI":"10.1016\/j.neunet.2008.05.003","volume":"21","author":"K Schindler","year":"2008","unstructured":"Schindler K, Van Gool L, de Gelder B (2008) Recognizing emotions expressed by body pose: a biologically inspired neural model. Neural Netw 21(9):1238\u20131246","journal-title":"Neural Netw"},{"issue":"6","key":"6778_CR51","doi-asserted-by":"publisher","first-page":"803","DOI":"10.1016\/j.imavis.2008.08.005","volume":"27","author":"C Shan","year":"2009","unstructured":"Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27(6):803\u2013816","journal-title":"Image Vis Comput"},{"key":"6778_CR52","doi-asserted-by":"publisher","unstructured":"Sikka K, Dykstra K, Sathyanarayana S, Littlewort G (2013) Multiple kernel learning for emotion recognition in the wild. In: ICMI 2013 - Proceedings of the 2013 ACM international conference on multimodal interaction. https:\/\/doi.org\/10.1145\/2522848.2531741","DOI":"10.1145\/2522848.2531741"},{"key":"6778_CR53","doi-asserted-by":"crossref","unstructured":"Sikka K, Wu T, Susskind J, Bartlett M (2012) Exploring bag of words architectures in the facial expression domain. In: Proceedings of the 12th international conference on computer vision - Volume 2, ECCV\u201912. Springer-Verlag, Berlin, Heidelberg, p. 250\u2013259https:\/\/doi.org\/10.1007\/978-3-642-33868-7_25","DOI":"10.1007\/978-3-642-33868-7_25"},{"key":"6778_CR54","unstructured":"Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds.) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. http:\/\/arxiv.org\/abs\/1409.1556"},{"key":"6778_CR55","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1007\/978-3-642-22158-3_29","volume-title":"Intelligent interactive multimedia systems and services","author":"IO Stathopoulou","year":"2011","unstructured":"Stathopoulou IO, Tsihrintzis GA (2011) Emotion recognition from body movements and gestures. In: Tsihrintzis GA, Virvou M, Jain LC, Howlett RJ (eds) Intelligent interactive multimedia systems and services. Springer, Berlin, Heidelberg, pp 295\u2013303"},{"key":"6778_CR56","doi-asserted-by":"crossref","unstructured":"Sun B, Li L, Zhou G, Wu X, He J, Yu L, Li D, Wei Q (2015) Combining multimodal features within a fusion network for emotion recognition in the wild. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, ICMI \u201915. Association for Computing Machinery, New York, NY, USA, p. 497\u2013502 https:\/\/doi.org\/10.1145\/2818346.2830586","DOI":"10.1145\/2818346.2830586"},{"key":"6778_CR57","unstructured":"Wang F, Tax DMJ (2016) Survey on the attention based RNN model and its applications in computer vision. CoRR. http:\/\/arxiv.org\/abs\/1601.06823"},{"key":"6778_CR58","unstructured":"Wang K, Peng X, Yang J, Meng D, Qiao Y (2019) Region attention networks for pose and occlusion robust facial expression recognition. CoRR. http:\/\/arxiv.org\/abs\/1905.04075"},{"key":"6778_CR59","doi-asserted-by":"publisher","first-page":"6544","DOI":"10.1109\/TIP.2021.3093397","volume":"30","author":"Z Zhao","year":"2021","unstructured":"Zhao Z, Liu Q, Wang S (2021) Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans Image Process 30:6544\u20136556","journal-title":"IEEE Trans Image Process"},{"key":"6778_CR60","doi-asserted-by":"crossref","unstructured":"Zhao Z, Liu Q, Zhou F (2021) Robust lightweight facial expression recognition network with label distribution training. In: Proceedings of the AAAI conference on artificial intelligence, 35:3510\u20133519","DOI":"10.1609\/aaai.v35i4.16465"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-021-06778-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-021-06778-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-021-06778-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,7]],"date-time":"2022-11-07T23:59:32Z","timestamp":1667865572000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-021-06778-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,13]]},"references-count":60,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["6778"],"URL":"https:\/\/doi.org\/10.1007\/s00521-021-06778-x","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,13]]},"assertion":[{"value":"21 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 November 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"The source code and trained model of our network are available at .","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability"}}]}}