{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T18:35:43Z","timestamp":1770748543641,"version":"3.50.0"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"the National Natural Science Foundation of China","award":["72188101, 61725203, 61976078"],"award-info":[{"award-number":["72188101, 61725203, 61976078"]}]},{"name":"the National Key R&D Program of China","award":["2019YFA0706203"],"award-info":[{"award-number":["2019YFA0706203"]}]},{"name":"the Anhui Provincial Natural Science Foundation","award":["2208085QF191"],"award-info":[{"award-number":["2208085QF191"]}]},{"name":"the Fundamental Research Funds for the Central Universities","award":["JZ2022HGQA0164"],"award-info":[{"award-number":["JZ2022HGQA0164"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3551876.3554809","type":"proceedings-article","created":{"date-parts":[[2022,9,28]],"date-time":"2022-09-28T22:17:21Z","timestamp":1664403441000},"page":"81-88","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis"],"prefix":"10.1145","author":[{"given":"Jia","family":"Li","sequence":"first","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Ziyang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Junjie","family":"Lang","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Yueqi","family":"Jiang","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Liuwei","family":"An","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Peng","family":"Zou","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Yangyang","family":"Xu","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Sheng","family":"Gao","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Jie","family":"Lin","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Chunxiao","family":"Fan","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Xiao","family":"Sun","sequence":"additional","affiliation":[{"name":"Hefei University of Technology &amp; ZhongJuYuan Intelligent Technology Co., Ltd, Hefei, China"}]},{"given":"Meng","family":"Wang","sequence":"additional","affiliation":[{"name":"Hefei University of Technology &amp; Hefei Comprehensive National Science Center, Hefei, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3551792"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"crossref","unstructured":"Shahin Amiriparian Maurice Gerczuk Sandra Ottl Nicholas Cummins Michael Freitag Sergey Pugachevskiy Alice Baird and Bj\u00f6rn Schuller. 2017. Snore sound classification using image-based deep spectrum features. (2017).  Shahin Amiriparian Maurice Gerczuk Sandra Ottl Nicholas Cummins Michael Freitag Sergey Pugachevskiy Alice Baird and Bj\u00f6rn Schuller. 2017. Snore sound classification using image-based deep spectrum features. (2017).","DOI":"10.21437\/Interspeech.2017-434"},{"key":"e_1_3_2_2_3_1","volume-title":"Colbert: Using bert sentence embedding for humor detection. arXiv preprint arXiv:2004.12765","author":"Annamoradnejad Issa","year":"2020","unstructured":"Issa Annamoradnejad and Gohar Zoghi . 2020 . Colbert: Using bert sentence embedding for humor detection. arXiv preprint arXiv:2004.12765 (2020). Issa Annamoradnejad and Gohar Zoghi. 2020. Colbert: Using bert sentence embedding for humor detection. arXiv preprint arXiv:2004.12765 (2020)."},{"key":"e_1_3_2_2_4_1","volume-title":"Jamie Ryan Kiros, and Geoffrey E Hinton","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba , Jamie Ryan Kiros, and Geoffrey E Hinton . 2016 . Layer normalization. arXiv preprint arXiv:1607.06450 (2016). Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016)."},{"key":"e_1_3_2_2_5_1","volume-title":"2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1--10","author":"Tadas Baltruvs","year":"2016","unstructured":"Tadas Baltruvs aitis, Peter Robinson , and Louis-Philippe Morency . 2016 . Openface: an open source facial behavior analysis toolkit . In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1--10 . Tadas Baltruvs aitis, Peter Robinson, and Louis-Philippe Morency. 2016. Openface: an open source facial behavior analysis toolkit. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1--10."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3347320.3357690"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3037496"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133944.3133949"},{"key":"e_1_3_2_2_9_1","volume-title":"Dzmitry Bahdanau, and Yoshua Bengio.","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho , Bart Van Merri\u00ebnboer , Dzmitry Bahdanau, and Yoshua Bengio. 2014 . On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014). Kyunghyun Cho, Bart Van Merri\u00ebnboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)."},{"key":"e_1_3_2_2_10_1","volume-title":"Proceedings of the 3rd Multimodal Sentiment Analysis Challenge. Association for Computing Machinery","author":"Christ Lukas","year":"2022","unstructured":"Lukas Christ , Shahin Amiriparian , Alice Baird , Panagiotis Tzirakis , Alexander Kathan , Niklas M\u00fcller , Lukas Stappen , Eva-Maria Me\u00dfner , Andreas K\u00f6nig , Alan Cowen , Erik Cambria , and Bj\u00f6rn W. Schuller . 2022. The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress . In Proceedings of the 3rd Multimodal Sentiment Analysis Challenge. Association for Computing Machinery , Lisbon, Portugal. Workshop held at ACM Multimedia 2022 , to appear. Lukas Christ, Shahin Amiriparian, Alice Baird, Panagiotis Tzirakis, Alexander Kathan, Niklas M\u00fcller, Lukas Stappen, Eva-Maria Me\u00dfner, Andreas K\u00f6nig, Alan Cowen, Erik Cambria, and Bj\u00f6rn W. Schuller. 2022. The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress. In Proceedings of the 3rd Multimodal Sentiment Analysis Challenge. Association for Computing Machinery, Lisbon, Portugal. Workshop held at ACM Multimedia 2022, to appear."},{"key":"e_1_3_2_2_11_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_2_12_1","volume-title":"Facial action coding system. Environmental Psychology & Nonverbal Behavior","author":"Ekman Paul","year":"1978","unstructured":"Paul Ekman and Wallace V Friesen . 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior ( 1978 ). Paul Ekman and Wallace V Friesen. 1978. Facial action coding system. Environmental Psychology & Nonverbal Behavior (1978)."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACII.2009.5349350"},{"key":"e_1_3_2_2_14_1","volume-title":"IEEE International Conference on Acoustics.","author":"Gemmeke J. F.","unstructured":"J. F. Gemmeke , Dpw Ellis , D. Freedman , A. Jansen , and M. Ritter . 2017a. Audio Set: An ontology and human-labeled dataset for audio events . In IEEE International Conference on Acoustics. J. F. Gemmeke, Dpw Ellis, D. Freedman, A. Jansen, and M. Ritter. 2017a. Audio Set: An ontology and human-labeled dataset for audio events. In IEEE International Conference on Acoustics."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-38067-9_8"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133944.3133946"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2020.3030497"},{"key":"e_1_3_2_2_20_1","volume-title":"STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding. arXiv preprint arXiv:2207.02756","author":"Lin Zihang","year":"2022","unstructured":"Zihang Lin , Chaolei Tan , Jian-Fang Hu , Zhi Jin , Tiancai Ye , and Wei-Shi Zheng . 2022. STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding. arXiv preprint arXiv:2207.02756 ( 2022 ). Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, and Wei-Shi Zheng. 2022. STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding. arXiv preprint arXiv:2207.02756 (2022)."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1046"},{"key":"e_1_3_2_2_22_1","volume-title":"Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Ilya Sutskever , Kai Chen , Greg S Corrado , and Jeff Dean . 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems , Vol. 26 ( 2013 ). Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems , Vol. 26 (2013)."},{"key":"e_1_3_2_2_23_1","volume-title":"Meta-learning with temporal convolutions. arXiv preprint arXiv:1707.03141","author":"Mishra Nikhil","year":"2017","unstructured":"Nikhil Mishra , Mostafa Rohaninejad , Xi Chen , and Pieter Abbeel . 2017. Meta-learning with temporal convolutions. arXiv preprint arXiv:1707.03141 , Vol. 2 , 7 ( 2017 ), 23. Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. 2017. Meta-learning with temporal convolutions. arXiv preprint arXiv:1707.03141 , Vol. 2, 7 (2017), 23."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2017.2740923"},{"key":"e_1_3_2_2_25_1","volume-title":"Augly: Data augmentations for robustness. arXiv preprint arXiv:2201.06494","author":"Papakipos Zoe","year":"2022","unstructured":"Zoe Papakipos and Joanna Bitton . 2022 . Augly: Data augmentations for robustness. arXiv preprint arXiv:2201.06494 (2022). Zoe Papakipos and Joanna Bitton. 2022. Augly: Data augmentations for robustness. arXiv preprint arXiv:2201.06494 (2022)."},{"key":"e_1_3_2_2_26_1","unstructured":"Maciej Pawlowski Anna Wr\u00f3blewska and Sylwia Sysko-Roma?czuk. 2022. Does a Technique for Building Multimodal Representation Matter? -- Comparative Analysis. arXiv preprint arXiv:2206.06367.  Maciej Pawlowski Anna Wr\u00f3blewska and Sylwia Sysko-Roma?czuk. 2022. Does a Technique for Building Multimodal Representation Matter? -- Comparative Analysis. arXiv preprint arXiv:2206.06367."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2016.0055"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.3765\/bls.v5i0.2164"},{"key":"e_1_3_2_2_30_1","volume-title":"Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084","author":"Reimers Nils","year":"2019","unstructured":"Nils Reimers and Iryna Gurevych . 2019 . Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019). Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019)."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"crossref","unstructured":"Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein etal 2015. Imagenet large scale visual recognition challenge. International journal of computer vision Vol. 115 3 (2015) 211--252.  Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision Vol. 115 3 (2015) 211--252.","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_2_32_1","unstructured":"Burak Satar Hongyuan Zhu Hanwang Zhang and Joo Hwee Lim. 2022. RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval. arXiv preprint arXiv:2206.12845 (2022).  Burak Satar Hongyuan Zhu Hanwang Zhang and Joo Hwee Lim. 2022. RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval. arXiv preprint arXiv:2206.12845 (2022)."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Bj\u00f6rn Schuller Stefan Steidl and Anton Batliner. 2009. The interspeech 2009 emotion challenge. (2009).  Bj\u00f6rn Schuller Stefan Steidl and Anton Batliner. 2009. The interspeech 2009 emotion challenge. (2009).","DOI":"10.21437\/Interspeech.2009-103"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Bj\u00f6rn Schuller Stefan Steidl Anton Batliner Alessandro Vinciarelli Klaus Scherer Fabien Ringeval Mohamed Chetouani Felix Weninger Florian Eyben Erik Marchi etal 2013. The INTERSPEECH 2013 computational paralinguistics challenge: Social signals conflict emotion autism. In Proceedings INTERSPEECH 2013 14th Annual Conference of the International Speech Communication Association Lyon France.  Bj\u00f6rn Schuller Stefan Steidl Anton Batliner Alessandro Vinciarelli Klaus Scherer Fabien Ringeval Mohamed Chetouani Felix Weninger Florian Eyben Erik Marchi et al. 2013. The INTERSPEECH 2013 computational paralinguistics challenge: Social signals conflict emotion autism. In Proceedings INTERSPEECH 2013 14th Annual Conference of the International Speech Communication Association Lyon France.","DOI":"10.21437\/Interspeech.2013-56"},{"key":"e_1_3_2_2_35_1","volume-title":"Interspeech","author":"Sebastian Jilt","year":"2019","unstructured":"Jilt Sebastian and Piero Pierucci . 2019. Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts . In Interspeech 2019 . Jilt Sebastian and Piero Pierucci. 2019. Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts. In Interspeech 2019."},{"key":"e_1_3_2_2_36_1","volume-title":"Attention is all you need. Advances in neural information processing systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems , Vol. 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems , Vol. 30 (2017)."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2013.34"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001499000495"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3266302.3266313"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3551876.3554809","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3551876.3554809","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:16Z","timestamp":1750186816000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3551876.3554809"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":39,"alternative-id":["10.1145\/3551876.3554809","10.1145\/3551876"],"URL":"https:\/\/doi.org\/10.1145\/3551876.3554809","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}