{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T01:28:56Z","timestamp":1760318936191,"version":"3.37.3"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"28-29","license":[{"start":{"date-parts":[[2020,9,18]],"date-time":"2020-09-18T00:00:00Z","timestamp":1600387200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,9,18]],"date-time":"2020-09-18T00:00:00Z","timestamp":1600387200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Multimed Tools Appl"],"published-print":{"date-parts":[[2021,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>People generally perceive other people\u2019s emotions based on speech and facial expressions, so it can be helpful to use speech signals and facial images simultaneously. However, because the characteristics of speech and image data are different, combining the two inputs is still a challenging issue in the area of emotion-recognition research. In this paper, we propose a method to recognize emotions by synchronizing speech signals and image sequences. We design three deep networks. One of the networks is trained using image sequences, which focus on facial expression changes. Facial landmarks are also input to another network to reflect facial motion. The speech signals are first converted to acoustic features, which are used for the input of the other network, synchronizing the image sequence. These three networks are combined using a novel integration method to boost the performance of emotion recognition. A test comparing accuracy is conducted to verify the proposed method. The results demonstrated that the proposed method exhibits more accurate performance than previous studies.<\/jats:p>","DOI":"10.1007\/s11042-020-09842-1","type":"journal-article","created":{"date-parts":[[2020,9,18]],"date-time":"2020-09-18T19:02:31Z","timestamp":1600455751000},"page":"35871-35885","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Human emotion recognition based on the weighted integration method using image sequences and acoustic features"],"prefix":"10.1007","volume":"80","author":[{"given":"Sung-Woo","family":"Byun","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2520-6681","authenticated-orcid":false,"given":"Seok-Pil","family":"Lee","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,9,18]]},"reference":[{"unstructured":"Bjorn S, Stefan S, Anton B, Alessandro V, Klaus S, Fabien R, Mohamed C, Felix W, Florian E, Erik M, Marcello M, Hugues S, Anna P, Fabio V, Samuel K (2013) Interspeech 2013 Computational Paralinguistics Challenge: Social Signals, Conflict, Emotion, Autism","key":"9842_CR1"},{"key":"9842_CR2","doi-asserted-by":"publisher","first-page":"7714","DOI":"10.3390\/s130607714","volume":"13","author":"G Deepak","year":"2013","unstructured":"Deepak G, Joonwhoan L (2013) Geometric feature-based facial expression recognition in image sequences using multi-class AdaBoost and support vector machines. Sensors 13:7714\u20137734. https:\/\/doi.org\/10.3390\/s130607714","journal-title":"Sensors"},{"key":"9842_CR3","doi-asserted-by":"publisher","first-page":"101646","DOI":"10.1016\/j.bspc.2019.101646","volume":"55","author":"JA Dom\u00ednguez-Jim\u00e9nez","year":"2020","unstructured":"Dom\u00ednguez-Jim\u00e9nez JA, Campo-Landines KC, Mart\u00ednez-Santos J, Delahoz EJ, Contreras-Ortiz S (2020) A machine learning model for emotion recognition from physiological signals. Biomed Signal Proces 55:101646","journal-title":"Biomed Signal Proces"},{"key":"9842_CR4","doi-asserted-by":"publisher","first-page":"572","DOI":"10.1016\/j.patcog.2010.09.020","volume":"44","author":"M El Ayadi","year":"2011","unstructured":"El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recogn 44:572\u2013587. https:\/\/doi.org\/10.1016\/j.patcog.2010.09.020","journal-title":"Pattern Recogn"},{"key":"9842_CR5","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1109\/TAFFC.2015.2457417","volume":"7","author":"F Eyben","year":"2016","unstructured":"Eyben F, Scherer KR, Schuller BW et al (2016) The Geneva minimalistic acoustic parameter set (geMAPS) for voice research and affective computing. IEEE Trans Affect Comput 7:190\u2013202. https:\/\/doi.org\/10.1109\/TAFFC.2015.2457417","journal-title":"IEEE Trans Affect Comput"},{"key":"9842_CR6","doi-asserted-by":"publisher","first-page":"7803","DOI":"10.1007\/s11042-016-3418-y","volume":"76","author":"D Ghimire","year":"2017","unstructured":"Ghimire D, Jeong S, Lee J, Park SH (2017) Facial expression recognition based on local region specific features and support vector machines. Multimed Tools Appl 76:7803\u20137821. https:\/\/doi.org\/10.1007\/s11042-016-3418-y","journal-title":"Multimed Tools Appl"},{"unstructured":"Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press.\u00a0https:\/\/www.deeplearningbook.org. Accessed 1 Mar 2020","key":"9842_CR7"},{"key":"9842_CR8","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1016\/j.jneumeth.2011.06.023","volume":"200","author":"J Hamm","year":"2011","unstructured":"Hamm J, Kohler CG, Gur RC, Verma R (2011) Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. J Neurosci Methods 200:237\u2013256","journal-title":"J Neurosci Methods"},{"doi-asserted-by":"crossref","unstructured":"Happy SL, George A, Routray A (2012) A real time facial expression classification system using local binary patterns. In Proc 4th Int Conf Intell Human Comput Interact 27\u201329:1\u20135","key":"9842_CR9","DOI":"10.1109\/IHCI.2012.6481802"},{"doi-asserted-by":"publisher","unstructured":"Hasani B, Mahoor MH (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks. IEEE Conf Comput Vision Pattern Recognit Workshops (CVPRW). https:\/\/doi.org\/10.1109\/CVPRW.2017.282","key":"9842_CR10","DOI":"10.1109\/CVPRW.2017.282"},{"key":"9842_CR11","doi-asserted-by":"publisher","first-page":"5546","DOI":"10.3837\/tiis.2019.11.015","volume":"7","author":"J He","year":"2019","unstructured":"He J, Li D, Bo S, Yu L (2019) Facial action unit detection with multilayer fused multi-task and multi-label deep learning network. KSII Trans Internet Inf Syst 7:5546\u20135559. https:\/\/doi.org\/10.3837\/tiis.2019.11.015","journal-title":"KSII Trans Internet Inf Syst"},{"key":"9842_CR12","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1016\/j.inffus.2018.09.008","volume":"49","author":"MS Hossain","year":"2019","unstructured":"Hossain MS, Muhammad G (2019) Emotion recognition using deep learning approach from audio\u2013visual emotional big data. Inf Fusion 49:69\u201378. https:\/\/doi.org\/10.1016\/j.inffus.2018.09.008","journal-title":"Inf Fusion"},{"doi-asserted-by":"crossref","unstructured":"Hutto CJ, Eric G (2014) VADER: A parsimonious rule-based model for sentiment analysis of social media text. AAAI Publications, Eighth Int AAAI Conf Weblogs Soc Media","key":"9842_CR13","DOI":"10.1609\/icwsm.v8i1.14550"},{"doi-asserted-by":"crossref","unstructured":"Iliou T, Anagnostopoulos C-N (2009) Statistical evaluation of speech features for emotion recognition. In: Digital telecommunications ICDT\u201909 4th Int Conf IEEE 121\u2013126","key":"9842_CR14","DOI":"10.1109\/ICDT.2009.30"},{"key":"9842_CR15","doi-asserted-by":"publisher","first-page":"924","DOI":"10.3837\/tiis.2020.03.001","volume":"14","author":"X Jia","year":"2020","unstructured":"Jia X, Li W, Wang Y, Hong S, Su X (2020) An action unit co-occurrence constraint 3DCNN based action unit recognition approach. KSII Trans Internet Inf Syst 14:924\u2013942. https:\/\/doi.org\/10.3837\/tiis.2020.03.001","journal-title":"KSII Trans Internet Inf Syst"},{"unstructured":"Joseph R, Santosh D, Ross G, Ali F (2015) You Only Look Once: Unified, Real-Time Object Detection arXiv preprint arXiv:1506.02640","key":"9842_CR16"},{"doi-asserted-by":"publisher","unstructured":"Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. 2015 IEEE Int Conf Comput Vision (ICCV) https:\/\/doi.org\/10.1109\/ICCV.2015.341","key":"9842_CR17","DOI":"10.1109\/ICCV.2015.341"},{"doi-asserted-by":"crossref","unstructured":"Kao YH, Lee LS (2006) Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language. In: InterSpeech","key":"9842_CR18","DOI":"10.21437\/Interspeech.2006-501"},{"doi-asserted-by":"crossref","unstructured":"Kaulard K, Cunningham DW, B\u00fclthoff HH, Wallraven C (2012) The MPI facial expression database\u2014A validated database of emotional and conversational facial expressions. PLoS One 7:e32321.","key":"9842_CR19","DOI":"10.1371\/journal.pone.0032321"},{"key":"9842_CR20","doi-asserted-by":"publisher","first-page":"1159","DOI":"10.1016\/j.patrec.2013.03.022","volume":"34","author":"RA Khan","year":"2013","unstructured":"Khan RA, Meyer A, Konik H, Bouakaz S (2013) Framework for reliable, real-time facial expression recognition for low resolution images. Pattern Recogn Lett 34:1159\u20131168. https:\/\/doi.org\/10.1016\/j.patrec.2013.03.022","journal-title":"Pattern Recogn Lett"},{"doi-asserted-by":"publisher","unstructured":"Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors 18. https:\/\/doi.org\/10.3390\/s18020401","key":"9842_CR21","DOI":"10.3390\/s18020401"},{"doi-asserted-by":"publisher","unstructured":"LeCun Y, Bengio Y, Hinton G (2015) Deep learning, Nature 521. https:\/\/doi.org\/10.1038\/nature14539","key":"9842_CR22","DOI":"10.1038\/nature14539"},{"key":"9842_CR23","doi-asserted-by":"publisher","first-page":"2422","DOI":"10.1121\/1.4878044","volume":"135","author":"C Lee","year":"2014","unstructured":"Lee C, Lui S, So C (2014) Visualization of time-varying joint development of pitch and dynamics for speech emotion recognition. J Acoust Soc Am 135:2422. https:\/\/doi.org\/10.1121\/1.4878044","journal-title":"J Acoust Soc Am"},{"doi-asserted-by":"publisher","unstructured":"Li S, Deng W (2020) Deep facial expression recognition: A survey. IEEE Trans Affective Comp (Early Access). https:\/\/doi.org\/10.1109\/TAFFC.2020.2981446","key":"9842_CR24","DOI":"10.1109\/TAFFC.2020.2981446"},{"doi-asserted-by":"publisher","unstructured":"Liu M, Li S, Shan S, Wang R, and Chen X (2014) Deeply learning deformable facial action parts model for dynamic expression analysis. 2014 Asian Conference on Computer Vision (ACCV) 143\u2013157. https:\/\/doi.org\/10.1007\/978-3-319-16817-3_10","key":"9842_CR25","DOI":"10.1007\/978-3-319-16817-3_10"},{"doi-asserted-by":"publisher","unstructured":"Lotfian R, Busso C (2019) Curriculum learning for speech emotion recognition from crowdsourced labels. IEEE\/ACM Trans Audio, Speech Lang Processing 4. https:\/\/doi.org\/10.1109\/TASLP.2019.2898816","key":"9842_CR26","DOI":"10.1109\/TASLP.2019.2898816"},{"doi-asserted-by":"crossref","unstructured":"Luengo I, Navas E, Hern\u00e1ez I, S\u00e1nchez J (2005) Automatic emotion recognition using prosodic parameters. In: Interspeech, 493\u2013496","key":"9842_CR27","DOI":"10.21437\/Interspeech.2005-324"},{"key":"9842_CR28","doi-asserted-by":"publisher","first-page":"184","DOI":"10.1016\/j.inffus.2018.06.003","volume":"46","author":"Y Ma","year":"2019","unstructured":"Ma Y, Hao Y, Chen M, Chen J, Lu P, Ko\u0161ir A (2019) Audio-visual emotion fusion (AVEF): A deep efficient weighted approach. Inf Fusion 46:184\u2013192. https:\/\/doi.org\/10.1016\/j.inffus.2018.06.003","journal-title":"Inf Fusion"},{"key":"9842_CR29","first-page":"53","volume":"2","author":"A Mehrabian","year":"1968","unstructured":"Mehrabian A (1968) Communication without words. Psychol Today 2:53\u201356","journal-title":"Psychol Today"},{"unstructured":"Mira J, ByoungChul K, JaeYeal N (2016) Facial landmark detection based on an ensemble of local weighted regressors during real driving situation. Int Conf Pattern Recognit 1\u20136.","key":"9842_CR30"},{"key":"9842_CR31","doi-asserted-by":"publisher","first-page":"2753","DOI":"10.1109\/TCSVT.2017.2769096","volume":"28","author":"J Mira","year":"2018","unstructured":"Mira J, ByoungChul K, Sooyeong K, JaeYeal N (2018) Driver facial landmark detection in real driving situations. IEEE Trans Circuits Syst Video Technol 28:2753\u20132767. https:\/\/doi.org\/10.1109\/TCSVT.2017.2769096","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"issue":"2","key":"9842_CR32","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1007\/s10772-012-9172-2","volume":"16","author":"KS Rao","year":"2013","unstructured":"Rao KS, Koolagudi SG, Vempada RR (2013) Emotion recognition from speech using global and local prosodic features. Int J Speech Technol 16(2):143\u2013160","journal-title":"Int J Speech Technol"},{"doi-asserted-by":"publisher","unstructured":"Scherer KR (2003) Vocal communication of emotion: A review of research paradigms. Speech Comm 40:227\u2013256. https:\/\/doi.org\/10.1016\/S0167-6393(02)00084-5. https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0167639302000845. Accessed 1 Mar 2020","key":"9842_CR33","DOI":"10.1016\/S0167-6393(02)00084-5"},{"issue":"9\u201310","key":"9842_CR34","doi-asserted-by":"publisher","first-page":"1062","DOI":"10.1016\/j.specom.2011.01.011","volume":"53","author":"B Schuller","year":"2011","unstructured":"Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Comm 53(9\u201310):1062\u20131087. https:\/\/doi.org\/10.1016\/j.specom.2011.01.011","journal-title":"Speech Comm"},{"key":"9842_CR35","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1016\/j.procs.2019.04.009","volume":"151","author":"FA Shaqr","year":"2019","unstructured":"Shaqr FA, Duwairi R, Al-Ayyou M (2019) Recognizing emotion from speech based on age and gender using hierarchical models. Procedia Comput Sci 151:37\u201344. https:\/\/doi.org\/10.1016\/j.procs.2019.04.009","journal-title":"Procedia Comput Sci"},{"key":"9842_CR36","doi-asserted-by":"publisher","first-page":"1386","DOI":"10.1109\/TIP.2015.2405346","volume":"24","author":"MH Siddiqi","year":"2015","unstructured":"Siddiqi MH, Ali R, Khan AM, Park YT, Lee S (2015) Human facial expression recognition using stepwise linear discriminant analysis and hidden conditional random fields. IEEE Trans Image Proc 24:1386\u20131398. https:\/\/doi.org\/10.1109\/TIP.2015.2405346","journal-title":"IEEE Trans Image Proc"},{"doi-asserted-by":"publisher","unstructured":"Song P, Zheng W (2018) Feature selection based transfer subspace learning for speech emotion recognition. IEEE Trans Affective Comput (Early Access) https:\/\/doi.org\/10.1109\/TAFFC.2018.2800046","key":"9842_CR37","DOI":"10.1109\/TAFFC.2018.2800046"},{"key":"9842_CR38","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1016\/j.patrec.2017.10.022","volume":"119","author":"N Sun","year":"2019","unstructured":"Sun N, Qi L, Huan R, Liu J, Han G (2019) Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recogn Lett 119:49\u201361. https:\/\/doi.org\/10.1016\/j.patrec.2017.10.022","journal-title":"Pattern Recogn Lett"},{"key":"9842_CR39","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1007\/s10772-018-9491-z","volume":"21","author":"M Swain","year":"2018","unstructured":"Swain M, Routray A, Kabisatpathy P (2018) Databases, features and classifiers for speech emotion recognition: A review. Int J Speech Technol 21:93\u2013120. https:\/\/doi.org\/10.1007\/s10772-018-9491-z","journal-title":"Int J Speech Technol"},{"doi-asserted-by":"publisher","unstructured":"Wang X, Chen X, Cao C (2020) Human emotion recognition by optimally fusing facial expression and speech feature. Signal Process Image Commun https:\/\/doi.org\/10.1016\/j.image.2020.115831","key":"9842_CR40","DOI":"10.1016\/j.image.2020.115831"},{"doi-asserted-by":"publisher","unstructured":"Wu CH, Yeh JF, Chuang ZJ (2009) Emotion perception and recognition from speech, Affective Inf Processing 93\u2013110. https:\/\/doi.org\/10.1007\/978-1-84800-306-4_6.","key":"9842_CR41","DOI":"10.1007\/978-1-84800-306-4_6"},{"doi-asserted-by":"publisher","unstructured":"Xiong X and Fernando DlT (2013) Supervised descent method and its applications to face alignment. 2013 IEEE Conf Comput Vision and Pattern Recognit (CVPR) https:\/\/doi.org\/10.1109\/CVPR.2013.75","key":"9842_CR42","DOI":"10.1109\/CVPR.2013.75"},{"doi-asserted-by":"publisher","unstructured":"Zamil AAA, Hasan S, Baki SJ, Adam J, Zaman I (2019) Emotion detection from speech signals using voting mechanism on classified frames. 2019 Int Conf Robotics, Electr Signal Processing Technol (ICREST) https:\/\/doi.org\/10.1109\/ICREST.2019.8644168","key":"9842_CR43","DOI":"10.1109\/ICREST.2019.8644168"},{"key":"9842_CR44","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1016\/j.patrec.2019.12.013","volume":"131","author":"H Zhang","year":"2020","unstructured":"Zhang H, Huang B, Tian G (2020) Facial expression recognition based on deep convolution long short-term memory networks of double-channel weighted mixture. Pattern Recogn Lett 131:128\u2013134. https:\/\/doi.org\/10.1016\/j.patrec.2019.12.013","journal-title":"Pattern Recogn Lett"},{"key":"9842_CR45","doi-asserted-by":"publisher","first-page":"1576","DOI":"10.1109\/TMM.2017.2766843","volume":"20","author":"S Zhang","year":"2008","unstructured":"Zhang S, Zhang S, Huang T, Gao W (2008) Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Trans Multimed 20:1576\u20131590. https:\/\/doi.org\/10.1109\/TMM.2017.2766843","journal-title":"IEEE Trans Multimed"},{"key":"9842_CR46","doi-asserted-by":"publisher","first-page":"2528","DOI":"10.1109\/TMM.2016.2598092","volume":"18","author":"T Zhang","year":"2016","unstructured":"Zhang T, Zheng W, Cui Z, Zong Y, Yan J, Yan K (2016) A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans Multimed 18:2528\u20132536. https:\/\/doi.org\/10.1109\/TMM.2016.2598092","journal-title":"IEEE Trans Multimed"},{"key":"9842_CR47","doi-asserted-by":"publisher","first-page":"312","DOI":"10.1016\/j.bspc.2018.08.035","volume":"47","author":"J Zhao","year":"2019","unstructured":"Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomed Signal Processing Control 47:312\u2013323. https:\/\/doi.org\/10.1016\/j.bspc.2018.08.035","journal-title":"Biomed Signal Processing Control"}],"container-title":["Multimedia Tools and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-020-09842-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11042-020-09842-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-020-09842-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,19]],"date-time":"2022-11-19T05:05:36Z","timestamp":1668834336000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11042-020-09842-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,18]]},"references-count":47,"journal-issue":{"issue":"28-29","published-print":{"date-parts":[[2021,11]]}},"alternative-id":["9842"],"URL":"https:\/\/doi.org\/10.1007\/s11042-020-09842-1","relation":{},"ISSN":["1380-7501","1573-7721"],"issn-type":[{"type":"print","value":"1380-7501"},{"type":"electronic","value":"1573-7721"}],"subject":[],"published":{"date-parts":[[2020,9,18]]},"assertion":[{"value":"4 June 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 July 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 September 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 September 2020","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}