{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T21:41:36Z","timestamp":1767994896060,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62176078"],"award-info":[{"award-number":["62176078"]}]},{"name":"National Key R&D Program of China","award":["2018YFB1005103"],"award-info":[{"award-number":["2018YFB1005103"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3551876.3554813","type":"proceedings-article","created":{"date-parts":[[2022,9,28]],"date-time":"2022-09-28T22:17:21Z","timestamp":1664403441000},"page":"101-109","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Leveraging Multi-modal Interactions among the Intermediate Representations of Deep Transformers for Emotion Recognition"],"prefix":"10.1145","author":[{"given":"Yang","family":"Wu","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhenyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pai","family":"Peng","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanyan","family":"Zhao","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bing","family":"Qin","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644","author":"Alain Guillaume","year":"2016","unstructured":"Guillaume Alain and Yoshua Bengio . 2016. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644 ( 2016 ). Guillaume Alain and Yoshua Bengio. 2016. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644 (2016)."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3551792"},{"key":"e_1_3_2_1_3_1","volume-title":"Jamie Ryan Kiros, and Geoffrey E Hinton","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba , Jamie Ryan Kiros, and Geoffrey E Hinton . 2016 . Layer normalization. arXiv preprint arXiv:1607.06450 (2016). Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1208"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2018.00019"},{"key":"e_1_3_2_1_6_1","volume-title":"IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation","author":"Busso Carlos","year":"2008","unstructured":"Carlos Busso , Murtaza Bulut , Chi-Chun Lee , Abe Kazemzadeh , Emily Mower , Samuel Kim , Jeannette N Chang , Sungbok Lee , and Shrikanth S Narayanan . 2008 . IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation , Vol. 42 , 4 (2008), 335--359. Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation , Vol. 42, 4 (2008), 335--359."},{"key":"e_1_3_2_1_7_1","volume-title":"Shallowing deep networks: Layer-wise pruning based on feature representations","author":"Chen Shi","year":"2018","unstructured":"Shi Chen and Qi Zhao . 2018. Shallowing deep networks: Layer-wise pruning based on feature representations . IEEE transactions on pattern analysis and machine intelligence, Vol. 41 , 12 ( 2018 ), 3048--3056. Shi Chen and Qi Zhao. 2018. Shallowing deep networks: Layer-wise pruning based on feature representations. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 12 (2018), 3048--3056."},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of the 3rd Multimodal Sentiment Analysis Challenge. Association for Computing Machinery","author":"Christ Lukas","year":"2022","unstructured":"Lukas Christ , Shahin Amiriparian , Alice Baird , Panagiotis Tzirakis , Alexander Kathan , Niklas M\u00fcller , Lukas Stappen , Eva-Maria Me\u00dfner , Andreas K\u00f6nig , Alan Cowen , Erik Cambria , and Bj\u00f6rn W. Schuller . 2022. The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress . In Proceedings of the 3rd Multimodal Sentiment Analysis Challenge. Association for Computing Machinery , Lisbon, Portugal. Workshop held at ACM Multimedia 2022 , to appear. Lukas Christ, Shahin Amiriparian, Alice Baird, Panagiotis Tzirakis, Alexander Kathan, Niklas M\u00fcller, Lukas Stappen, Eva-Maria Me\u00dfner, Andreas K\u00f6nig, Alan Cowen, Erik Cambria, and Bj\u00f6rn W. Schuller. 2022. The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress. In Proceedings of the 3rd Multimodal Sentiment Analysis Challenge. Association for Computing Machinery, Lisbon, Portugal. Workshop held at ACM Multimedia 2022, to appear."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.417"},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the 1st Conference of the Asia-Pacific","author":"Dai Wenliang","year":"2020","unstructured":"Wenliang Dai , Zihan Liu , Tiezheng Yu , and Pascale Fung . 2020. Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition . In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics , Suzhou, China , 269--280. https:\/\/aclanthology.org\/ 2020 .aacl-main.30 Wenliang Dai, Zihan Liu, Tiezheng Yu, and Pascale Fung. 2020. Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 269--280. https:\/\/aclanthology.org\/2020.aacl-main.30"},{"key":"e_1_3_2_1_11_1","volume-title":"International Conference on Learning Representations.","author":"Dosovitskiy Alexey","year":"2020","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , 2020 . An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale . In International Conference on Learning Representations. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874246"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1015"},{"key":"e_1_3_2_1_14_1","volume-title":"Vision transformers with patch diversification. arXiv preprint arXiv:2104.12753","author":"Gong Chengyue","year":"2021","unstructured":"Chengyue Gong , Dilin Wang , Meng Li , Vikas Chandra , and Qiang Liu . 2021b. Vision transformers with patch diversification. arXiv preprint arXiv:2104.12753 ( 2021 ). Chengyue Gong, Dilin Wang, Meng Li, Vikas Chandra, and Qiang Liu. 2021b. Vision transformers with patch diversification. arXiv preprint arXiv:2104.12753 (2021)."},{"key":"e_1_3_2_1_15_1","first-page":"571","article-title":"AST","volume":"2021","author":"Gong Yuan","year":"2021","unstructured":"Yuan Gong , Yu-An Chung , and James Glass . 2021 a. AST : Audio Spectrogram Transformer. In Proc. Interspeech 2021. 571 -- 575 . https:\/\/doi.org\/10.21437\/Interspeech.2021--698 10.21437\/Interspeech.2021--698 Yuan Gong, Yu-An Chung, and James Glass. 2021a. AST: Audio Spectrogram Transformer. In Proc. Interspeech 2021. 571--575. https:\/\/doi.org\/10.21437\/Interspeech.2021--698","journal-title":"Audio Spectrogram Transformer. In Proc. Interspeech"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413678"},{"key":"e_1_3_2_1_17_1","volume-title":"Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415","author":"Hendrycks Dan","year":"2016","unstructured":"Dan Hendrycks and Kevin Gimpel . 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 ( 2016 ). Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683898"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1356"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01330"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413577"},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of NAACL-HLT. 4171--4186","author":"Ming-Wei Chang Jacob Devlin","year":"2019","unstructured":"Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova . 2019 . Bert: Pre-training of deep bidirectional transformers for language understanding . In Proceedings of NAACL-HLT. 4171--4186 . Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. 4171--4186."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.197"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.574"},{"key":"e_1_3_2_1_25_1","volume-title":"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=H1eA7AEtvS","author":"Lan Zhenzhong","year":"2020","unstructured":"Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , and Radu Soricut . 2020 . ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=H1eA7AEtvS Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=H1eA7AEtvS"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054458"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00258"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016818"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU51503.2021.9688093"},{"key":"e_1_3_2_1_31_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=dUV91uaXm3","author":"Shi Han","year":"2022","unstructured":"Han Shi , Jiahui Gao , Hang Xu , Xiaodan Liang , Zhenguo Li , Lingpeng Kong , Stephen M. S. Lee , and James Kwok . 2022 . Revisiting Over-smoothing in BERT from the Perspective of Graph . In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=dUV91uaXm3 Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, and James Kwok. 2022. Revisiting Over-smoothing in BERT from the Perspective of Graph. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=dUV91uaXm3"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1142"},{"key":"e_1_3_2_1_33_1","volume-title":"International Conference on Machine Learning. PMLR, 10347--10357","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron , Matthieu Cord , Matthijs Douze , Francisco Massa , Alexandre Sablayrolles , and Herv\u00e9 J\u00e9gou . 2021 . Training data-efficient image transformers & distillation through attention . In International Conference on Machine Learning. PMLR, 10347--10357 . Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. PMLR, 10347--10357."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1656"},{"key":"e_1_3_2_1_35_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.417"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.724"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10590-1_53"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3551876.3554813","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3551876.3554813","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:17Z","timestamp":1750186817000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3551876.3554813"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":38,"alternative-id":["10.1145\/3551876.3554813","10.1145\/3551876"],"URL":"https:\/\/doi.org\/10.1145\/3551876.3554813","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}