{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,17]],"date-time":"2026-07-17T10:52:29Z","timestamp":1784285549724,"version":"3.55.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"2s","license":[{"start":{"date-parts":[[2023,2,17]],"date-time":"2023-02-17T00:00:00Z","timestamp":1676592000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61602161, 61772180"],"award-info":[{"award-number":["61602161, 61772180"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Hubei Province Science and Technology Support Project","award":["2020BAB012"],"award-info":[{"award-number":["2020BAB012"]}]},{"name":"The Fundamental Research Funds for the Research Fund of Key Lab of Traffic and Internet of Things","award":["WUT:2015-015-A03"],"award-info":[{"award-number":["WUT:2015-015-A03"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2023,6,30]]},"abstract":"<jats:p>Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multi-modal data. As the research on and applications of multi-modal data analysis are becoming more and more broad, it is necessary to optimize BERT internal structure. This article proposes a hierarchical multi-head self-attention and gate channel BERT, which is an optimized BERT model. The model is composed of three modules: the hierarchical multi-head self-attention module realizes the hierarchical extraction process of features; the gate channel module replaces BERT\u2019s original Feed Forward layer to realize information filtering; and the tensor fusion model based on a self-attention mechanism is utilized to implement the fusion process of different modal features. Experiments show that our method achieves promising results and improves accuracy by 5\u20136% when compared with traditional models on the CMU-MOSI dataset.<\/jats:p>","DOI":"10.1145\/3566126","type":"journal-article","created":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T13:05:58Z","timestamp":1666011958000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["A Optimized BERT for Multimodal Sentiment Analysis"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9683-053X","authenticated-orcid":false,"given":"Jun","family":"Wu","sequence":"first","affiliation":[{"name":"School of Computer Science, Hubei University of Technology, Wuhan, Hubei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6514-8055","authenticated-orcid":false,"given":"Tianliang","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Hubei University of Technology, Wuhan, Hubei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4552-134X","authenticated-orcid":false,"given":"Jiahui","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Hubei University of Technology, Wuhan, Hubei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2511-1247","authenticated-orcid":false,"given":"Tianyi","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science, Hubei University of Technology, Wuhan, Hubei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9620-3421","authenticated-orcid":false,"given":"Chunzhi","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Hubei University of Technology, Wuhan, Hubei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,2,17]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Mehdi Arjmand Mohammad Javad Dousti and Hadi Moradi. 2021. TEASEL: A transformer-based speech-prefixed language model. https:\/\/arxiv.org\/abs\/2109.05522."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1208"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3136755.3136801"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1088\/1757-899x\/490\/6\/062063"},{"key":"e_1_3_1_6_2","article-title":"Learning phrase representations using RNN encoder-decoder for statistical machine translation","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho, Bart Van Merri\u00ebnboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv:1406.1078. Retrieved from https:\/\/arxiv.org\/abs\/1406.1078.","journal-title":"arXiv:1406.1078"},{"key":"e_1_3_1_7_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. arxiv:cs.CL\/1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016892"},{"key":"e_1_3_1_9_2","unstructured":"Pengcheng He Xiaodong Liu Jianfeng Gao and Weizhu Chen. 2020. DeBERTa: Decoding-enhanced BERT with disentangled attention. arxiv:2006.03654. Retrieved from DOI:https:\/\/arxiv.org\/abs\/2006.03654."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3388861"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-020-10336-3"},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Paul Pu Liang Ziyin Liu Amir Zadeh and Louis-Philippe Morency. 2018. Multimodal language analysis with recurrent multistage fusion. arxiv:cs.LG\/1808.03920. Retrieved from https:\/\/arxiv.org\/abs\/1808.03920.","DOI":"10.18653\/v1\/D18-1014"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3464425"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3300939"},{"key":"e_1_3_1_16_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arxiv:cs.CL\/1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692."},{"key":"e_1_3_1_17_2","unstructured":"Zhun Liu Ying Shen Varun Bharadhwaj Lakshminarasimhan Paul Pu Liang Amir Zadeh and Louis-Philippe Morency. 2018. Efficient Low-rank multimodal fusion with modality-specific factors. arxiv:cs.AI\/1806.00064. Retrieved from https:\/\/arxiv.org\/abs\/1806.00064."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2070481.2070509"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/161468.161469"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3402886"},{"key":"e_1_3_1_22_2","first-page":"6558","volume-title":"Proceedings of the Conference of the Association for Computational Linguistics","volume":"2019","author":"Tsai Yao-Hung Hubert","year":"2019","unstructured":"Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the Conference of the Association for Computational Linguistics, Vol. 2019. NIH Public Access, 6558."},{"key":"e_1_3_1_23_2","unstructured":"Yao-Hung Hubert Tsai Paul Pu Liang Amir Zadeh Louis-Philippe Morency and Ruslan Salakhutdinov. 2019. Learning Factorized Multimodal Representations. arxiv:cs.LG\/1806.06176. Retrieved from https:\/\/arxiv.org\/abs\/1806.06176."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2020.07.022"},{"key":"e_1_3_1_25_2","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc."},{"key":"e_1_3_1_26_2","unstructured":"Wei Wang Bin Bi Ming Yan Chen Wu Zuyi Bao Jiangnan Xia Liwei Peng and Luo Si. 2019. StructBERT: Incorporating language structures into pre-training for deep language understanding. Arxiv.1908.04577. Retrieved from https:\/\/arxiv.org\/abs\/1908.04577."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2013.34"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458281"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413690"},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Amir Zadeh Minghai Chen Soujanya Poria Erik Cambria and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. arxiv:cs.CL\/1707.07250. Retrieved from https:\/\/arxiv.org\/abs\/1707.07250.","DOI":"10.18653\/v1\/D17-1115"},{"key":"e_1_3_1_31_2","unstructured":"Amir Zadeh Rowan Zellers Eli Pincus and Louis-Philippe Morency. 2016. MOSI: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arxiv:cs.CL\/1606.06259. Retrieved from https:\/\/arxiv.org\/abs\/1606.06259."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3566126","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3566126","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:31Z","timestamp":1750182691000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3566126"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,17]]},"references-count":30,"journal-issue":{"issue":"2s","published-print":{"date-parts":[[2023,6,30]]}},"alternative-id":["10.1145\/3566126"],"URL":"https:\/\/doi.org\/10.1145\/3566126","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,17]]},"assertion":[{"value":"2022-03-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-08-16","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}