{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T15:58:29Z","timestamp":1781798309884,"version":"3.54.5"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2021,3,10]],"date-time":"2021-03-10T00:00:00Z","timestamp":1615334400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,3,10]],"date-time":"2021-03-10T00:00:00Z","timestamp":1615334400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education","award":["NRF-2017R1A4A1015559"],"award-info":[{"award-number":["NRF-2017R1A4A1015559"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2021,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this study, we present a fusion model for emotion recognition based on visual data. The proposed model uses video information as its input and generates emotion labels for each video sample. Based on the video data, we first choose the most significant face regions with the use of a face detection and selection step. Subsequently, we employ three CNN-based architectures to extract the high-level features of the face image sequence. Furthermore, we adjusted one additional module for each CNN-based architecture to capture the sequential information of the entire video dataset. The combination of the three CNN-based models in a late-fusion-based approach yields a competitive result when compared to the baseline approach while using two public datasets: AFEW 2016 and SAVEE.<\/jats:p>","DOI":"10.1007\/s11227-021-03690-y","type":"journal-article","created":{"date-parts":[[2021,3,10]],"date-time":"2021-03-10T09:05:39Z","timestamp":1615367139000},"page":"10773-10790","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":37,"title":["Deep neural network-based fusion model for emotion recognition using visual data"],"prefix":"10.1007","volume":"77","author":[{"given":"Luu-Ngoc","family":"Do","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hyung-Jeong","family":"Yang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hai-Duong","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Soo-Hyung","family":"Kim","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guee-Sang","family":"Lee","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"In-Seop","family":"Na","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,3,10]]},"reference":[{"key":"3690_CR1","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1007\/978-3-319-08491-6_5","volume-title":"Human-computer systems interaction: backgrounds and applications","author":"A Ko\u0142akowska","year":"2014","unstructured":"Ko\u0142akowska A, Landowska A, Szwoch M, Szwoch W, Wr\u00f3bel MR (2014) Emotion recognition and its applications. In: Hippe ZS, Kulikowski JL, Mroczek T, Wtorek J (eds) Human-computer systems interaction: backgrounds and applications, vol 3. Springer, Cham, pp 51\u201362"},{"issue":"4","key":"3690_CR2","doi-asserted-by":"publisher","first-page":"1052","DOI":"10.1109\/TVT.2004.830974","volume":"53","author":"J Qiang","year":"2004","unstructured":"Qiang J, Zhiwei Z, Lan P (2004) Real-time nonintrusive monitoring and prediction of driver fatigue. IEEE Trans Veh Technol 53(4):1052\u20131068. https:\/\/doi.org\/10.1109\/TVT.2004.830974","journal-title":"IEEE Trans Veh Technol"},{"key":"3690_CR3","doi-asserted-by":"publisher","unstructured":"Rehg JM, Abowd GD, Rozga A, Romero M, Clements MA, Sclaroff S, Essa I, Ousley OY, Li Y, Kim C, Rao H, Kim JC, Presti LL, Zhang J, Lantsman D, Bidwell J, Ye Z (2013) Decoding children's social behavior. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp 3414\u20133421. doi:https:\/\/doi.org\/10.1109\/CVPR.2013.438","DOI":"10.1109\/CVPR.2013.438"},{"issue":"10","key":"3690_CR4","doi-asserted-by":"publisher","first-page":"630","DOI":"10.1016\/j.imavis.2014.01.004","volume":"32","author":"D McDuff","year":"2014","unstructured":"McDuff D, El Kaliouby R, Senechal T, Demirdjian D, Picard R (2014) Automatic measurement of ad preferences from facial responses gathered over the Internet. Image Vis Comput 32(10):630\u2013640. https:\/\/doi.org\/10.1016\/j.imavis.2014.01.004","journal-title":"Image Vis Comput"},{"key":"3690_CR5","doi-asserted-by":"crossref","unstructured":"Yao A, Shao J, Ma N, Chen Y (2015) Capturing AU-aware facial features and their latent relations for emotion recognition in the wild. Paper presented at the Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, Washington, USA","DOI":"10.1145\/2818346.2830585"},{"key":"3690_CR6","unstructured":"Kahou SE, Michalski V, Konda K, Memisevic R, Pal C (2015) Recurrent neural networks for emotion recognition in video. Paper presented at the Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, Washington, USA"},{"key":"3690_CR7","doi-asserted-by":"crossref","unstructured":"Yao A, Cai D, Hu P, Wang S, Sha L, Chen Y (2016) HoloNet: towards robust emotion recognition in the wild. Paper presented at the Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan","DOI":"10.1145\/2993148.2997639"},{"issue":"11","key":"3690_CR8","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"Y Lecun","year":"1998","unstructured":"Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278\u20132324. https:\/\/doi.org\/10.1109\/5.726791","journal-title":"Proc IEEE"},{"key":"3690_CR9","doi-asserted-by":"crossref","unstructured":"Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning Spatiotemporal Features with 3D Convolutional Networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 4489\u20134497","DOI":"10.1109\/ICCV.2015.510"},{"key":"3690_CR10","doi-asserted-by":"crossref","unstructured":"Omkar M. Parkhi AVaAZ (2015) Deep face recognition. Paper presented at the Proceedings of the British Machine Vision Conference (BMVC)","DOI":"10.5244\/C.29.41"},{"issue":"8","key":"3690_CR11","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735\u20131780. https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735","journal-title":"Neural Comput"},{"key":"3690_CR12","doi-asserted-by":"publisher","unstructured":"Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21\u201326. pp 1800\u20131807. https:\/\/doi.org\/10.1109\/CVPR.2017.195","DOI":"10.1109\/CVPR.2017.195"},{"key":"3690_CR13","first-page":"125","volume":"10","author":"P Sarangi","year":"2017","unstructured":"Sarangi P, Mishra B, Dehuri S (2017) Pyramid histogram of oriented gradients based human ear identification. Int J Control Theory Appl 10:125\u2013133","journal-title":"Int J Control Theory Appl"},{"key":"3690_CR14","doi-asserted-by":"publisher","first-page":"803","DOI":"10.1016\/j.imavis.2008.08.005","volume":"27","author":"C Shan","year":"2009","unstructured":"Shan C, Gong S, McOwan P (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27:803\u2013816. https:\/\/doi.org\/10.1016\/j.imavis.2008.08.005","journal-title":"Image Vis Comput"},{"key":"3690_CR15","doi-asserted-by":"publisher","unstructured":"Walecki R, Rudovic O, Pavlovic V, Pantic M (2015) Variable-state latent conditional random fields for facial expression recognition and action unit detection. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 4\u20138. pp 1\u20138. https:\/\/doi.org\/10.1109\/FG.2015.7163137","DOI":"10.1109\/FG.2015.7163137"},{"key":"3690_CR16","doi-asserted-by":"publisher","unstructured":"Kaya H, G\u00fcrp\u0131nar F, Afshar S, Salah A (2015) Contrasting and combining least squares based learners for emotion recognition in the wild. In: 2015 ACM International Conference on Multimodal Interaction, pp 459\u2013466. https:\/\/doi.org\/10.1145\/2818346.2830588","DOI":"10.1145\/2818346.2830588"},{"key":"3690_CR17","doi-asserted-by":"publisher","DOI":"10.1145\/3065386","author":"A Krizhevsky","year":"2012","unstructured":"Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Neural Inf Process Syst. https:\/\/doi.org\/10.1145\/3065386","journal-title":"Neural Inf Process Syst"},{"key":"3690_CR18","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 14091556"},{"key":"3690_CR19","doi-asserted-by":"publisher","unstructured":"Szegedy C, Wei L, Yangqing J, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7\u201312. pp 1\u20139. https:\/\/doi.org\/10.1109\/CVPR.2015.7298594","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"3690_CR20","doi-asserted-by":"publisher","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27\u201330. pp 770\u2013778. https:\/\/doi.org\/10.1109\/CVPR.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"key":"3690_CR21","unstructured":"Tang Y (2013) Deep learning using linear support vector machines. Paper presented at the Workshop on Challenges in Representation Learning, International Conference on Machine Learning, 06\/02"},{"key":"3690_CR22","unstructured":"Courville PCaA (2013) Challenges in representation learning: Facial expression recognition challenge. https:\/\/www.kaggle.com\/c\/challenges-in-representation-learning-facial-expression-recognition-challenge"},{"key":"3690_CR23","doi-asserted-by":"publisher","unstructured":"Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), 7\u201310. pp 1\u201310. https:\/\/doi.org\/10.1109\/WACV.2016.7477450","DOI":"10.1109\/WACV.2016.7477450"},{"key":"3690_CR24","doi-asserted-by":"publisher","unstructured":"Zhou S, Liang Y, Wan J, Li S (2016) Facial expression recognition based on multi-scale CNNs. In: 2016 11th Chinese Conference on Biometric Recognition, pp 503\u2013510. https:\/\/doi.org\/10.1007\/978-3-319-46654-5_55","DOI":"10.1007\/978-3-319-46654-5_55"},{"key":"3690_CR25","doi-asserted-by":"crossref","unstructured":"Yu Z, Zhang C (2015) Image based static facial expression recognition with multiple deep network learning. Paper presented at the Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, Washington, USA","DOI":"10.1145\/2818346.2830595"},{"key":"3690_CR26","doi-asserted-by":"publisher","unstructured":"Dhall A, Goecke R, Joshi J, Hoey J, Gedeon T (2016) EmotiW 2016: video and group-level emotion recognition challenges. https:\/\/doi.org\/10.1145\/2993148.2997638","DOI":"10.1145\/2993148.2997638"},{"key":"3690_CR27","doi-asserted-by":"crossref","unstructured":"Fan Y, Lu X, Li D, Liu Y (2016) Video-based emotion recognition using CNN-RNN and C3D hybrid networks. Paper presented at the Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo, Japan","DOI":"10.1145\/2993148.2997632"},{"key":"3690_CR28","doi-asserted-by":"crossref","unstructured":"Hu P, Ramanan D (2017) Finding tiny faces. Paper presented at the IEEE Conference on Computer Vision and Pattern Recognition","DOI":"10.1109\/CVPR.2017.166"},{"key":"3690_CR29","unstructured":"Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Paper presented at the International Conference on Learning Representations, 12\/22"},{"key":"3690_CR30","unstructured":"Haq S, Jackson PJB Speaker-dependent audio-visual emotion recognition. In: AVSP, 2009."},{"key":"3690_CR31","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1016\/j.inffus.2019.02.003","volume":"51","author":"M Amin-Naji","year":"2019","unstructured":"Amin-Naji M, Aghagolzadeh A, Ezoji M (2019) Ensemble of CNN for multi-focus image fusion. Inf Fusion 51:201\u2013214. https:\/\/doi.org\/10.1016\/j.inffus.2019.02.003","journal-title":"Inf Fusion"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-021-03690-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-021-03690-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-021-03690-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,22]],"date-time":"2023-10-22T20:54:22Z","timestamp":1698008062000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-021-03690-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,10]]},"references-count":31,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2021,10]]}},"alternative-id":["3690"],"URL":"https:\/\/doi.org\/10.1007\/s11227-021-03690-y","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,10]]},"assertion":[{"value":"13 February 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 March 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}