{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T18:21:52Z","timestamp":1781720512987,"version":"3.54.5"},"reference-count":37,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,9,26]],"date-time":"2021-09-26T00:00:00Z","timestamp":1632614400000},"content-version":"vor","delay-in-days":268,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"He\u2019nan Educational Committee","award":["21A520006"],"award-info":[{"award-number":["21A520006"]}]},{"name":"He\u2019nan Educational Committee","award":["182102310919"],"award-info":[{"award-number":["182102310919"]}]},{"name":"Scientific and Technological Research Project of Henan Provincial Science and Technology Department","award":["21A520006"],"award-info":[{"award-number":["21A520006"]}]},{"name":"Scientific and Technological Research Project of Henan Provincial Science and Technology Department","award":["182102310919"],"award-info":[{"award-number":["182102310919"]}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computational Intelligence and Neuroscience"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>The context, such as scenes and objects, plays an important role in video emotion recognition. The emotion recognition accuracy can be further improved when the context information is incorporated. Although previous research has considered the context information, the emotional clues contained in different images may be different, which is often ignored. To address the problem of emotion difference between different modes and different images, this paper proposes a hierarchical attention\u2010based multimodal fusion network for video emotion recognition, which consists of a multimodal feature extraction module and a multimodal feature fusion module. The multimodal feature extraction module has three subnetworks used to extract features of facial, scene, and global images. Each subnetwork consists of two branches, where the first branch extracts the features of different modes, and the other branch generates the emotion score for each image. Features and emotion scores of all images in a modal are aggregated to generate the emotion feature of the modal. The other module takes multimodal features as input and generates the emotion score for each modal. Finally, features and emotion scores of multiple modes are aggregated, and the final emotion representation of the video will be produced. Experimental results show that our proposed method is effective on the emotion recognition dataset.<\/jats:p>","DOI":"10.1155\/2021\/5585041","type":"journal-article","created":{"date-parts":[[2021,9,26]],"date-time":"2021-09-26T18:24:41Z","timestamp":1632680681000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Hierarchical Attention\u2010Based Multimodal Fusion Network for Video Emotion Recognition"],"prefix":"10.1155","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5180-9665","authenticated-orcid":false,"given":"Xiaodong","family":"Liu","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Songyang","family":"Li","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Miao","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"311","published-online":{"date-parts":[[2021,9,26]]},"reference":[{"key":"e_1_2_10_1_2","doi-asserted-by":"publisher","DOI":"10.1109\/thms.2017.2695613"},{"key":"e_1_2_10_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2016.2547397"},{"key":"e_1_2_10_3_2","doi-asserted-by":"crossref","unstructured":"LiuM. WangR. LiS. ShanS. HuangZ. andChenX. Combining multiple kernel methods on Riemannian Manifold for emotion recognition in the wild Proceedings of the International Conference on Multimodal Interaction ACM ICMI 2014 Istanbul Turkey 494\u2013501.","DOI":"10.1145\/2663204.2666274"},{"key":"e_1_2_10_4_2","unstructured":"YaoA. CaiD. HuP. WangS. ShaL. andHoloNetY. C. Towards robust emotion recognition in the wild Proceedings of the 18th ACM International Conference on Multimodal Interaction November 2016 Tokyo Japan 472\u2013478."},{"key":"e_1_2_10_5_2","doi-asserted-by":"crossref","unstructured":"PiniS.andAhmedB. Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild Proceedings of the ACM International Conference on Multimodal Interaction November 2017 Glasgow Scotland 536\u2013544.","DOI":"10.1145\/3136755.3143006"},{"key":"e_1_2_10_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.01.017"},{"key":"e_1_2_10_7_2","doi-asserted-by":"publisher","DOI":"10.1177\/0963721411422522"},{"key":"e_1_2_10_8_2","doi-asserted-by":"crossref","unstructured":"HessU.andHareliS. The influence of context on emotion recognition in humans Proceedings of the 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition May 2015 Ljubljana Slovenia 1\u20136.","DOI":"10.1109\/FG.2015.7284842"},{"key":"e_1_2_10_9_2","doi-asserted-by":"crossref","unstructured":"KostiR. AlvarezJ. M. RecasensA. andLapedrizaA. Emotion recognition in context Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) July 2017 Honolulu HI USA 1667\u20131675.","DOI":"10.1109\/CVPR.2017.212"},{"key":"e_1_2_10_10_2","doi-asserted-by":"crossref","unstructured":"ChenC. WuZ. andJiangY. G. Emotion in context: deep semantic feature fusion for video emotion recognition Proceedings of the ACM on Multimedia Conference October 2016 Amsterdam The Netherlands 127\u2013131.","DOI":"10.1145\/2964284.2967196"},{"key":"e_1_2_10_11_2","doi-asserted-by":"publisher","DOI":"10.1037\/h0030377"},{"key":"e_1_2_10_12_2","doi-asserted-by":"crossref","unstructured":"LiuY. YanJ. andOuyangW. Quality aware network for set to set recognition Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition July 2017 Honolulu HI USA 4694\u20134703.","DOI":"10.1109\/CVPR.2017.499"},{"key":"e_1_2_10_13_2","doi-asserted-by":"crossref","unstructured":"LongX. GanC. MeloG. D.et al. Attention clusters: purely attention based local feature integration for video classification Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition June 2018 Salt Lake City UT USA 7834\u20137843.","DOI":"10.1109\/CVPR.2018.00817"},{"key":"e_1_2_10_14_2","doi-asserted-by":"crossref","unstructured":"YaoA. CaiD. andHuP. HoloNet: towards robust emotion recognition in the wild Proceedings of the Acm International Conference on Multimodal Interaction November 2016 Tokyo Japan 472\u2013478.","DOI":"10.1145\/2993148.2997639"},{"key":"e_1_2_10_15_2","doi-asserted-by":"crossref","unstructured":"SunM. C. HsuS. H. andYangM. C. Context-aware cascade attention-based RNN for video emotion recognition Proceedings of the First Asian Conference on Affective Computing and Intelligent Interaction May 2018 Beijing China 1\u20136 https:\/\/doi.org\/10.1109\/aciiasia.2018.8470372 2-s2.0-85055562587.","DOI":"10.1109\/ACIIAsia.2018.8470372"},{"key":"e_1_2_10_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/access.2021.3091169"},{"key":"e_1_2_10_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2017.09.010"},{"key":"e_1_2_10_18_2","doi-asserted-by":"publisher","DOI":"10.3390\/s20082169"},{"key":"e_1_2_10_19_2","doi-asserted-by":"crossref","unstructured":"VielzeufV. PateuxS. andJurieF. Temporal multimodal fusion for video emotion classification in the wild Proceedings of the 19th ACM International Conference on Multimodal Interaction November 2017 Glasgow Scotland 569\u2013576.","DOI":"10.1145\/3136755.3143011"},{"key":"e_1_2_10_20_2","doi-asserted-by":"crossref","unstructured":"ZhangY. DuJ. andWangZ. Attention based fully convolutional network for speech emotion recognition Proceedings of the 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference November 2018 Honolulu HI USA 1771\u20131775.","DOI":"10.23919\/APSIPA.2018.8659587"},{"key":"e_1_2_10_21_2","doi-asserted-by":"crossref","unstructured":"LeeJ. KimS. KimS. andSohnK. Spatiotemporal attention based deep neural networks for emotion recognition Proceedings of the International Conference on Acoustics Speech and Signal Processing April 2018 Calgary Canada 1\u20135.","DOI":"10.1109\/ICASSP.2018.8461920"},{"key":"e_1_2_10_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.01.096"},{"key":"e_1_2_10_23_2","doi-asserted-by":"crossref","unstructured":"LiuJ. SuY. andLiuY. Multi-modal emotion recognition with temporal-band Attention based on LSTM-RNN Proceedings of the Advances in Multimedia Information Processing 2017 Harbin China 194\u2013204.","DOI":"10.1007\/978-3-319-77380-3_19"},{"key":"e_1_2_10_24_2","doi-asserted-by":"crossref","unstructured":"HuangC. W.andNarayananS. S. Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME) July 2017 Hong Kong China 583\u2013588.","DOI":"10.1109\/ICME.2017.8019296"},{"key":"e_1_2_10_25_2","unstructured":"FanL.andYunjieK. Spatiotemporal networks for video emotion recognition 2017 https:\/\/arxiv.org\/abs\/1704.00570."},{"key":"e_1_2_10_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/tmm.2018.2808760"},{"key":"e_1_2_10_27_2","doi-asserted-by":"crossref","unstructured":"XuB. ZhengY. andYeH. Video motion recognition with concept selection Proceedings of the 2019 IEEE International Conference on Multimedia and Expo July 2019 Shanghai China 406\u2013411.","DOI":"10.1109\/ICME.2019.00077"},{"key":"e_1_2_10_28_2","doi-asserted-by":"crossref","unstructured":"SchroffF. KalenichenkoD. andJamesP. Facenet: a unified embedding for face recognition and clustering Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition June 2015 Boston MA USA 815\u2013823.","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_2_10_29_2","unstructured":"RenS. HeK. andGirshickR. Faster R-CNN: towards real-time object detection with region proposal networks Proceedings of the International Conference on Neural Information Processing Systems December 2015 Montreal Canada 91\u201399."},{"key":"e_1_2_10_30_2","doi-asserted-by":"crossref","unstructured":"YangS. Luo\u2009P. Chen\u2009C. L. andTangX. Wider face: a face detection benchmark Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition July 2016 Las Vegas NV USA 5525\u20135533.","DOI":"10.1109\/CVPR.2016.596"},{"key":"e_1_2_10_31_2","doi-asserted-by":"crossref","unstructured":"ParkhiO. M. VedaldiA. andZissermanA. Deep face recognition Proceedings of the British Machine Vision Conference September 2015 Swansea UK 1\u201312.","DOI":"10.5244\/C.29.41"},{"key":"e_1_2_10_32_2","unstructured":"ZhouB. LapedrizaA. XiaoJ. TorralbaA. andOlivaA. Learning deep features for scene recognition using places database Proceedings of the 27th International Conference on Neural Information Processing Systems November 2014 Bangkok Thailand 487\u2013495."},{"key":"e_1_2_10_33_2","doi-asserted-by":"crossref","unstructured":"XuB. FuY. JiangY.-G. LiB. andSigalL. Video emotion recognition with transferred deep feature encodings Proceedings of the Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval June 2016 New York NY USA 15\u201322.","DOI":"10.1145\/2911996.2912006"},{"key":"e_1_2_10_34_2","doi-asserted-by":"crossref","unstructured":"JiangY.-G. XuB. andXueX. Predicting emotions in user-generated videos Proceedings of the Proceedings of the 28th AAAI Conference on Artificial Intelligence July 2014 Qu\u00e9bec Canada 73\u201379.","DOI":"10.1609\/aaai.v28i1.8724"},{"key":"e_1_2_10_35_2","doi-asserted-by":"crossref","unstructured":"FeichtenhoferC. PinzA. andZissermanA. Convolutional two-stream network fusion for video action recognition Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition June 2016 Las Vegas NV USA 1933\u20131941.","DOI":"10.1109\/CVPR.2016.213"},{"key":"e_1_2_10_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/access.2020.3048693"},{"key":"e_1_2_10_37_2","doi-asserted-by":"publisher","DOI":"10.1155\/2020\/8843413"}],"container-title":["Computational Intelligence and Neuroscience"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/5585041.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/5585041.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/5585041","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T10:39:48Z","timestamp":1722940788000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/5585041"}},"subtitle":[],"editor":[{"given":"Qiangqiang","family":"Yuan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/5585041"],"URL":"https:\/\/doi.org\/10.1155\/2021\/5585041","archive":["Portico"],"relation":{},"ISSN":["1687-5265","1687-5273"],"issn-type":[{"value":"1687-5265","type":"print"},{"value":"1687-5273","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2021-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"5585041"}}