{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T18:02:01Z","timestamp":1775325721386,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,15]],"date-time":"2020-10-15T00:00:00Z","timestamp":1602720000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Key Program of the Natural Science Foundation of Tianjin","award":["No. 18JCZDJC36300"],"award-info":[{"award-number":["No. 18JCZDJC36300"]}]},{"name":"National Key Research & Development Plan of China","award":["No.2017YFB1002804"],"award-info":[{"award-number":["No.2017YFB1002804"]}]},{"name":"National Natural Science Foundation of China","award":["No.61831022, No.61771472, No.61773379, No.61901473"],"award-info":[{"award-number":["No.61831022, No.61771472, No.61773379, No.61901473"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,16]]},"DOI":"10.1145\/3423327.3423672","type":"proceedings-article","created":{"date-parts":[[2020,10,15]],"date-time":"2020-10-15T23:26:39Z","timestamp":1602804399000},"page":"27-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":68,"title":["Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism"],"prefix":"10.1145","author":[{"given":"Licai","family":"Sun","sequence":"first","affiliation":[{"name":"University of Chinese Academy of Sciences &amp; Institute of Automation, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zheng","family":"Lian","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianhua","family":"Tao","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bin","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mingyue","family":"Niu","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240578"},{"key":"e_1_3_2_1_2_1","volume-title":"Jamie Ryan Kiros, and Geoffrey E Hinton","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba , Jamie Ryan Kiros, and Geoffrey E Hinton . 2016 . Layer normalization. arXiv preprint arXiv:1607.06450 (2016). Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016)."},{"key":"e_1_3_2_1_3_1","volume-title":"Multimodal machine learning: A survey and taxonomy","author":"Tadas Baltruvs","year":"2018","unstructured":"Tadas Baltruvs aitis, Chaitanya Ahuja , and Louis-Philippe Morency . 2018. Multimodal machine learning: A survey and taxonomy . IEEE transactions on pattern analysis and machine intelligence , Vol. 41 , 2 ( 2018 ), 423--443. Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence , Vol. 41, 2 (2018), 423--443."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2018.00019"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2993148.2993165"},{"key":"e_1_3_2_1_6_1","volume-title":"Tomas Simon, Shih-En Wei, and Yaser A Sheikh.","author":"Cao Zhe","year":"2019","unstructured":"Zhe Cao , Gines Hidalgo Martinez , Tomas Simon, Shih-En Wei, and Yaser A Sheikh. 2019 . OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2019). Zhe Cao, Gines Hidalgo Martinez, Tomas Simon, Shih-En Wei, and Yaser A Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661806.2661811"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811634"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3347320.3357690"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2663204.2666277"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967286"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133944.3133949"},{"key":"e_1_3_2_1_13_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_1_14_1","volume-title":"Spatio-temporal encoder-decoder fully convolutional network for video-based dimensional emotion recognition","author":"Du Zhengyin","year":"2019","unstructured":"Zhengyin Du , Suowei Wu , Di Huang , Weixin Li , and Yunhong Wang . 2019. Spatio-temporal encoder-decoder fully convolutional network for video-based dimensional emotion recognition . IEEE Transactions on Affective Computing ( 2019 ). Zhengyin Du, Suowei Wu, Di Huang, Weixin Li, and Yunhong Wang. 2019. Spatio-temporal encoder-decoder fully convolutional network for video-based dimensional emotion recognition. IEEE Transactions on Affective Computing (2019)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2015.2457417"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874246"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0144610"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"e_1_3_2_1_21_1","volume-title":"Long short-term memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3266302.3266304"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133944.3133946"},{"key":"e_1_3_2_1_24_1","volume-title":"Efficient Modeling of Long Temporal Contexts for Continuous Emotion Recognition. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). 185--191","author":"Huang J.","unstructured":"J. Huang , J. Tao , B. Liu , Z. Lian , and M. Niu . 2019 . Efficient Modeling of Long Temporal Contexts for Continuous Emotion Recognition. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). 185--191 . J. Huang, J. Tao , B. Liu, Z. Lian, and M. Niu. 2019. Efficient Modeling of Long Temporal Contexts for Continuous Emotion Recognition. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). 185--191."},{"key":"e_1_3_2_1_25_1","volume-title":"Multimodal Transformer Fusion for Continuous Emotion Recognition. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3507--3511","author":"Huang Jian","year":"2020","unstructured":"Jian Huang , Jianhua Tao , Bin Liu , Zheng Lian , and Mingyue Niu . 2020 . Multimodal Transformer Fusion for Continuous Emotion Recognition. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3507--3511 . Jian Huang, Jianhua Tao, Bin Liu, Zheng Lian, and Mingyue Niu. 2020. Multimodal Transformer Fusion for Continuous Emotion Recognition. In ICASSP 2020--2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3507--3511."},{"key":"e_1_3_2_1_26_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_27_1","volume-title":"A concordance correlation coefficient to evaluate reproducibility. Biometrics","author":"Lawrence I","year":"1989","unstructured":"I Lawrence and Kuei Lin . 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics ( 1989 ), 255--268. I Lawrence and Kuei Lin. 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics (1989), 255--268."},{"key":"e_1_3_2_1_28_1","volume-title":"Deep learning. nature","author":"LeCun Yann","year":"2015","unstructured":"Yann LeCun , Yoshua Bengio , and Geoffrey Hinton . 2015. Deep learning. nature , Vol. 521 , 7553 ( 2015 ), 436--444. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature , Vol. 521, 7553 (2015), 436--444."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2019-1577"},{"key":"e_1_3_2_1_30_1","volume-title":"A Survey on Contextual Embeddings. arXiv preprint arXiv:2003.07278","author":"Liu Qi","year":"2020","unstructured":"Qi Liu , Matt J Kusner , and Phil Blunsom . 2020. A Survey on Contextual Embeddings. arXiv preprint arXiv:2003.07278 ( 2020 ). Qi Liu, Matt J Kusner, and Phil Blunsom. 2020. A Survey on Contextual Embeddings. arXiv preprint arXiv:2003.07278 (2020)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2018-1832"},{"key":"e_1_3_2_1_32_1","unstructured":"Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.  Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119."},{"key":"e_1_3_2_1_33_1","volume-title":"Proc. Interspeech 2019 . 4559--4563","author":"Niu Mingyue","year":"2019","unstructured":"Mingyue Niu , Jianhua Tao , Bin Liu , and Cunhang Fan . 2019 . Automatic Depression Level Detection via Lp-Norm Pooling . In Proc. Interspeech 2019 . 4559--4563 . Mingyue Niu, Jianhua Tao, Bin Liu, and Cunhang Fan. 2019. Automatic Depression Level Detection via Lp-Norm Pooling. In Proc. Interspeech 2019 . 4559--4563."},{"key":"e_1_3_2_1_34_1","volume-title":"et almbox","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , et almbox . 2019 . Pytorch : An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026--8037. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et almbox. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026--8037."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2017.02.003"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2512530.2512534"},{"key":"e_1_3_2_1_38_1","volume-title":"et almbox","author":"Schuller Bj\u00f6rn","year":"2013","unstructured":"Bj\u00f6rn Schuller , Stefan Steidl , Anton Batliner , Alessandro Vinciarelli , Klaus Scherer , Fabien Ringeval , Mohamed Chetouani , Felix Weninger , Florian Eyben , Erik Marchi , et almbox . 2013 . The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. In Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association , Lyon, France . Bj\u00f6rn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, et almbox. 2013. The INTERSPEECH 2013 computational paralinguistics challenge: Social signals, conflict, emotion, autism. In Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France ."},{"key":"e_1_3_2_1_39_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Lukas Stappen Alice Baird Georgios Rizos Panagiotis Tzirakis Xinchen Du Felix Hafner Lea Schumann Adria Mallol-Ragolta Bj\u00f6rn W Schuller Iulia Lefter Erik Cambria and Ioannis Kompatsiaris. 2020. MuSe 2020 Challenge and Workshop: Multimodal Sentiment Analysis Emotion-target Engagement and Trustworthiness Detection in Real-life Media. In 1st International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop co-located with the 28th ACM International Conference on Multimedia (ACM MM). ACM.  Lukas Stappen Alice Baird Georgios Rizos Panagiotis Tzirakis Xinchen Du Felix Hafner Lea Schumann Adria Mallol-Ragolta Bj\u00f6rn W Schuller Iulia Lefter Erik Cambria and Ioannis Kompatsiaris. 2020. MuSe 2020 Challenge and Workshop: Multimodal Sentiment Analysis Emotion-target Engagement and Trustworthiness Detection in Real-life Media. In 1st International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop co-located with the 28th ACM International Conference on Multimedia (ACM MM). ACM.","DOI":"10.1145\/3423327.3423673"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472669"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1656"},{"key":"e_1_3_2_1_43_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2012.03.001"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compedu.2019.103649"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1115"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3266302.3266313"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3347320.3357692"},{"key":"e_1_3_2_1_49_1","first-page":"1","article-title":"b. Affective computing for large-scale heterogeneous multimedia data: A survey","volume":"15","author":"Zhao Sicheng","year":"2019","unstructured":"Sicheng Zhao , Shangfei Wang , Mohammad Soleymani , Dhiraj Joshi , and Qiang Ji . 2019 b. Affective computing for large-scale heterogeneous multimedia data: A survey . ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) , Vol. 15 , 3s (2019), 1 -- 32 . Sicheng Zhao, Shangfei Wang, Mohammad Soleymani, Dhiraj Joshi, and Qiang Ji. 2019 b. Affective computing for large-scale heterogeneous multimedia data: A survey. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) , Vol. 15, 3s (2019), 1--32.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)"}],"event":{"name":"MM '20: The 28th ACM International Conference on Multimedia","location":"Seattle WA USA","acronym":"MM '20","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3423327.3423672","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3423327.3423672","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:24:57Z","timestamp":1750195497000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3423327.3423672"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,15]]},"references-count":49,"alternative-id":["10.1145\/3423327.3423672","10.1145\/3423327"],"URL":"https:\/\/doi.org\/10.1145\/3423327.3423672","relation":{},"subject":[],"published":{"date-parts":[[2020,10,15]]},"assertion":[{"value":"2020-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}