{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T16:30:06Z","timestamp":1779294606030,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":48,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,21]],"date-time":"2021-08-21T00:00:00Z","timestamp":1629504000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["812997"],"award-info":[{"award-number":["812997"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,21]]},"DOI":"10.1145\/3463945.3469058","type":"proceedings-article","created":{"date-parts":[[2021,8,27]],"date-time":"2021-08-27T14:29:53Z","timestamp":1630074593000},"page":"37-45","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":23,"title":["A Fair and Comprehensive Comparison of Multimodal Tweet Sentiment Analysis Methods"],"prefix":"10.1145","author":[{"given":"Gullal S.","family":"Cheema","sequence":"first","affiliation":[{"name":"TIB - Leibniz Information Center for Science and Technology, Hannover, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sherzod","family":"Hakimov","sequence":"additional","affiliation":[{"name":"TIB - Leibniz Information Center for Science and Technology, Hannover, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eric","family":"M\u00fcller-Budack","sequence":"additional","affiliation":[{"name":"TIB - Leibniz Information Center for Science and Technology, Hannover, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ralph","family":"Ewerth","sequence":"additional","affiliation":[{"name":"TIB - Leibniz Information Center for Science and Technology &amp; Leibniz University Hannover, Hannover, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,8,27]]},"reference":[{"issue":"4","key":"e_1_3_2_2_1_1","first-page":"8","article-title":"A Combined CNN and LS\u2122 Model for Arabic Sentiment Analysis","volume":"8","author":"Alayba Abdulaziz M.","year":"2018","unstructured":"Abdulaziz M. Alayba , Vasile Palade , Matthew England , and Rahat Iqbal . 2018 . A Combined CNN and LS\u2122 Model for Arabic Sentiment Analysis . In Machine Learning and Knowledge Extraction - Second IFIP TC 5, TC 8\/WG 8 . 4 , 8 .9, TC 12\/WG 12.9 International Cross-Domain Conference, CD-MAKE 2018, Hamburg, Germany, August 27--30, 2018, Proceedings (Lecture Notes in Computer Science, Vol. 11015), Andreas Holzinger, Peter Kieseberg, A Min Tjoa, and Edgar R. Weippl (Eds.). Springer, 179--191. https:\/\/doi.org\/10.1007\/978--3--319--99740--7_12 10.1007\/978--3--319--99740--7_12 Abdulaziz M. Alayba, Vasile Palade, Matthew England, and Rahat Iqbal. 2018. A Combined CNN and LS\u2122 Model for Arabic Sentiment Analysis. In Machine Learning and Knowledge Extraction - Second IFIP TC 5, TC 8\/WG 8.4, 8.9, TC 12\/WG 12.9 International Cross-Domain Conference, CD-MAKE 2018, Hamburg, Germany, August 27--30, 2018, Proceedings (Lecture Notes in Computer Science, Vol. 11015), Andreas Holzinger, Peter Kieseberg, A Min Tjoa, and Edgar R. Weippl (Eds.). Springer, 179--191. https:\/\/doi.org\/10.1007\/978--3--319--99740--7_12","journal-title":"Machine Learning and Knowledge Extraction - Second IFIP TC 5, TC 8\/WG"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2126"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502268"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502282"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1080\/02699930902975754"},{"key":"e_1_3_2_2_6_1","volume-title":"NLPCC 2015, Nanchang, China, October 9--13, 2015, Proceedings (Lecture Notes in Computer Science","volume":"167","author":"Cai Guoyong","year":"2015","unstructured":"Guoyong Cai and Binbin Xia . 2015 . Convolutional Neural Networks for Multimedia Sentiment Analysis. In Natural Language Processing and Chinese Computing - 4th CCF Conference , NLPCC 2015, Nanchang, China, October 9--13, 2015, Proceedings (Lecture Notes in Computer Science , Vol. 9362), Juanzi Li, Heng Ji, Dongyan Zhao, and Yansong Feng (Eds.). Springer, 159-- 167 . https:\/\/doi.org\/10.1007\/978--3--319--25207-0_14 10.1007\/978--3--319--25207-0_14 Guoyong Cai and Binbin Xia. 2015. Convolutional Neural Networks for Multimedia Sentiment Analysis. In Natural Language Processing and Chinese Computing - 4th CCF Conference, NLPCC 2015, Nanchang, China, October 9--13, 2015, Proceedings (Lecture Notes in Computer Science, Vol. 9362), Juanzi Li, Heng Ji, Dongyan Zhao, and Yansong Feng (Eds.). Springer, 159--167. https:\/\/doi.org\/10.1007\/978--3--319--25207-0_14"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00530-014-0407-8"},{"key":"e_1_3_2_2_8_1","unstructured":"Pierre-Luc Carrier and Aaron Courville. 2013. Challenges in Representation Learning: Facial Expression Recognition Challenge.  Pierre-Luc Carrier and Aaron Courville. 2013. Challenges in Representation Learning: Facial Expression Recognition Challenge."},{"key":"e_1_3_2_2_9_1","volume-title":"The interface between emotion and attention: A review of evidence from psychology and neuroscience. Behavioral and cognitive neuroscience reviews","author":"Compton Rebecca J","year":"2003","unstructured":"Rebecca J Compton . 2003. The interface between emotion and attention: A review of evidence from psychology and neuroscience. Behavioral and cognitive neuroscience reviews , Vol. 2 , 2 ( 2003 ), 115--129. Rebecca J Compton. 2003. The interface between emotion and attention: A review of evidence from psychology and neuroscience. Behavioral and cognitive neuroscience reviews, Vol. 2, 2 (2003), 115--129."},{"key":"e_1_3_2_2_10_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 , Minneapolis, MN, USA, June 2--7 , 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. https:\/\/doi.org\/10.18653\/v1\/n19--1423 10.18653\/v1 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. https:\/\/doi.org\/10.18653\/v1\/n19--1423"},{"key":"e_1_3_2_2_11_1","volume-title":"Emotional Attention: A Study of Image Sentiment and Visual Attention. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018","author":"Fan Shaojing","year":"2018","unstructured":"Shaojing Fan , Zhiqi Shen , Ming Jiang , Bryan L. Koenig , Juan Xu , Mohan S. Kankanhalli , and Qi Zhao . 2018 . Emotional Attention: A Study of Image Sentiment and Visual Attention. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 , Salt Lake City, UT, USA, June 18--22 , 2018. IEEE Computer Society, 7521--7531. https:\/\/doi.org\/10.1109\/CVPR.2018.00785 10.1109\/CVPR.2018.00785 Shaojing Fan, Zhiqi Shen, Ming Jiang, Bryan L. Koenig, Juan Xu, Mohan S. Kankanhalli, and Qi Zhao. 2018. Emotional Attention: A Study of Image Sentiment and Visual Attention. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. IEEE Computer Society, 7521--7531. https:\/\/doi.org\/10.1109\/CVPR.2018.00785"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2014.09.005"},{"key":"e_1_3_2_2_13_1","volume-title":"Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016","author":"He Kaiming","year":"2016","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2016 . Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 , Las Vegas, NV, USA, June 27--30 , 2016. IEEE Computer Society, 770--778. https:\/\/doi.org\/10.1109\/CVPR.2016.90 10.1109\/CVPR.2016.90 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. IEEE Computer Society, 770--778. https:\/\/doi.org\/10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_14_1","volume-title":"Modeling Rich Contexts for Sentiment Classification with LSTM. CoRR","author":"Huang Minlie","year":"2016","unstructured":"Minlie Huang , Yujie Cao , and Chao Dong . 2016. Modeling Rich Contexts for Sentiment Classification with LSTM. CoRR , Vol. abs\/ 1605 .01478 ( 2016 ). arxiv: 1605.01478 http:\/\/arxiv.org\/abs\/1605.01478 Minlie Huang, Yujie Cao, and Chao Dong. 2016. Modeling Rich Contexts for Sentiment Classification with LSTM. CoRR, Vol. abs\/1605.01478 (2016). arxiv: 1605.01478 http:\/\/arxiv.org\/abs\/1605.01478"},{"key":"e_1_3_2_2_15_1","volume-title":"PAKDD 2020, Singapore, May 11--14, 2020, Proceedings, Part II (Lecture Notes in Computer Science","volume":"797","author":"Jiang Tao","year":"2020","unstructured":"Tao Jiang , Jiahai Wang , Zhiyue Liu , and Yingbiao Ling . 2020 . Fusion-Extraction Network for Multimodal Sentiment Analysis. In Advances in Knowledge Discovery and Data Mining - 24th Pacific-Asia Conference , PAKDD 2020, Singapore, May 11--14, 2020, Proceedings, Part II (Lecture Notes in Computer Science , Vol. 12085), Hady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas, Ee-Peng Lim, See-Kiong Ng, and Sinno Jialin Pan (Eds.). Springer, 785-- 797 . https:\/\/doi.org\/10.1007\/978--3-030--47436--2_59 10.1007\/978--3-030--47436--2_59 Tao Jiang, Jiahai Wang, Zhiyue Liu, and Yingbiao Ling. 2020. Fusion-Extraction Network for Multimodal Sentiment Analysis. In Advances in Knowledge Discovery and Data Mining - 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11--14, 2020, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 12085), Hady W. Lauw, Raymond Chi-Wing Wong, Alexandros Ntoulas, Ee-Peng Lim, See-Kiong Ng, and Sinno Jialin Pan (Eds.). Springer, 785--797. https:\/\/doi.org\/10.1007\/978--3-030--47436--2_59"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_2_2_17_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba . 2015 . Adam : A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds .). http:\/\/arxiv.org\/abs\/1412.6980 Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_2_2_18_1","volume-title":"DSFD: Dual Shot Face Detector. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019","author":"Li Jian","year":"2019","unstructured":"Jian Li , Yabiao Wang , Changan Wang , Ying Tai , Jianjun Qian , Jian Yang , Chengjie Wang , Jilin Li , and Feiyue Huang . 2019 . DSFD: Dual Shot Face Detector. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019 , Long Beach, CA, USA, June 16--20 , 2019. Computer Vision Foundation \/ IEEE, 5060--5069. https:\/\/doi.org\/10.1109\/CVPR.2019.00520 10.1109\/CVPR.2019.00520 Jian Li, Yabiao Wang, Changan Wang, Ying Tai, Jianjun Qian, Jian Yang, Chengjie Wang, Jilin Li, and Feiyue Huang. 2019. DSFD: Dual Shot Face Detector. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. Computer Vision Foundation \/ IEEE, 5060--5069. https:\/\/doi.org\/10.1109\/CVPR.2019.00520"},{"key":"e_1_3_2_2_19_1","volume-title":"Proceedings, Part XXX (Lecture Notes in Computer Science","volume":"137","author":"Li Xiujun","year":"2020","unstructured":"Xiujun Li , Xi Yin , Chunyuan Li , Pengchuan Zhang , Xiaowei Hu , Lei Zhang , Lijuan Wang , Houdong Hu , Li Dong , Furu Wei , Yejin Choi , and Jianfeng Gao . 2020 . Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020 , Proceedings, Part XXX (Lecture Notes in Computer Science , Vol. 12375), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 121-- 137 . https:\/\/doi.org\/10.1007\/978--3-030--58577--8_8 10.1007\/978--3-030--58577--8_8 Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, and Jianfeng Gao. 2020. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXX (Lecture Notes in Computer Science, Vol. 12375), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Springer, 121--137. https:\/\/doi.org\/10.1007\/978--3-030--58577--8_8"},{"key":"e_1_3_2_2_20_1","volume-title":"A Survey of Opinion Mining and Sentiment Analysis","author":"Liu Bing","unstructured":"Bing Liu and Lei Zhang . 2012. A Survey of Opinion Mining and Sentiment Analysis . In Mining Text Data, Charu C. Aggarwal and ChengXiang Zhai (Eds.). Springer , 415--463. https:\/\/doi.org\/10.1007\/978--1--4614--3223--4_13 10.1007\/978--1--4614--3223--4_13 Bing Liu and Lei Zhang. 2012. A Survey of Opinion Mining and Sentiment Analysis. In Mining Text Data, Charu C. Aggarwal and ChengXiang Zhai (Eds.). Springer, 415--463. https:\/\/doi.org\/10.1007\/978--1--4614--3223--4_13"},{"key":"e_1_3_2_2_21_1","volume-title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu , Myle Ott , Naman Goyal , Jingfei Du , Mandar Joshi , Danqi Chen , Omer Levy , Mike Lewis , Luke Zettlemoyer , and Veselin Stoyanov . 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR , Vol. abs\/ 1907 .11692 ( 2019 ). arxiv: 1907.11692 http:\/\/arxiv.org\/abs\/1907.11692 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, Vol. abs\/1907.11692 (2019). arxiv: 1907.11692 http:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_2_2_22_1","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems","author":"Lu Jiasen","year":"2019","unstructured":"Jiasen Lu , Dhruv Batra , Devi Parikh , and Stefan Lee . 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks . In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 , NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alch\u00e9 -Buc, Emily B. Fox, and Roman Garnett (Eds .). 13--23. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/c74d97b01eae257e44aa9d5bade97baf-Abstract.html Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alch\u00e9 -Buc, Emily B. Fox, and Roman Garnett (Eds.). 13--23. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/c74d97b01eae257e44aa9d5bade97baf-Abstract.html"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asej.2014.04.011"},{"key":"e_1_3_2_2_24_1","volume-title":"MMM 2016, Miami, FL, USA, January 4--6, 2016, Proceedings, Part II (Lecture Notes in Computer Science","volume":"27","author":"Niu Teng","year":"2016","unstructured":"Teng Niu , Shiai Zhu , Lei Pang , and Abdulmotaleb El-Saddik . 2016 . Sentiment Analysis on Multi-View Social Data. In MultiMedia Modeling - 22nd International Conference , MMM 2016, Miami, FL, USA, January 4--6, 2016, Proceedings, Part II (Lecture Notes in Computer Science , Vol. 9517), Qi Tian, Nicu Sebe, Guo-Jun Qi, Benoit Huet, Richang Hong, and Xueliang Liu (Eds.). Springer, 15-- 27 . https:\/\/doi.org\/10.1007\/978--3--319--27674--8_2 10.1007\/978--3--319--27674--8_2 Teng Niu, Shiai Zhu, Lei Pang, and Abdulmotaleb El-Saddik. 2016. Sentiment Analysis on Multi-View Social Data. In MultiMedia Modeling - 22nd International Conference, MMM 2016, Miami, FL, USA, January 4--6, 2016, Proceedings, Part II (Lecture Notes in Computer Science, Vol. 9517), Qi Tian, Nicu Sebe, Guo-Jun Qi, Benoit Huet, Richang Hong, and Xueliang Liu (Eds.). Springer, 15--27. https:\/\/doi.org\/10.1007\/978--3--319--27674--8_2"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1011139631724"},{"key":"e_1_3_2_2_26_1","volume-title":"Manning","author":"Pennington Jeffrey","year":"2014","unstructured":"Jeffrey Pennington , Richard Socher , and Christopher D . Manning . 2014 . Glove : Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25--29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). ACL, 1532--1543. https:\/\/doi.org\/10.3115\/v1\/d14--1162 10.3115\/v1 Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25--29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). ACL, 1532--1543. https:\/\/doi.org\/10.3115\/v1\/d14--1162"},{"key":"e_1_3_2_2_27_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021 . Learning Transferable Visual Models From Natural Language Supervision. CoRR , Vol. abs\/ 2103 .00020 (2021). arxiv: 2103.00020 https:\/\/arxiv.org\/abs\/2103.00020 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. CoRR, Vol. abs\/2103.00020 (2021). arxiv: 2103.00020 https:\/\/arxiv.org\/abs\/2103.00020"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2015.01.005"},{"key":"e_1_3_2_2_30_1","volume-title":"Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA@EMNLP 2017","author":"Shin Bonggun","year":"2017","unstructured":"Bonggun Shin , Timothy Lee , and Jinho D. Choi . 2017. Lexicon Integrated CNN Models with Attention for Sentiment Analysis . In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA@EMNLP 2017 , Copenhagen, Denmark , September 8, 2017 , Alexandra Balahur, Saif M. Mohammad, and Erik van der Goot (Eds.). Association for Computational Linguistics, 149--158. https:\/\/doi.org\/10.18653\/v1\/w17--5220 10.18653\/v1 Bonggun Shin, Timothy Lee, and Jinho D. Choi. 2017. Lexicon Integrated CNN Models with Attention for Sentiment Analysis. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA@EMNLP 2017, Copenhagen, Denmark, September 8, 2017, Alexandra Balahur, Saif M. Mohammad, and Erik van der Goot (Eds.). Association for Computational Linguistics, 149--158. https:\/\/doi.org\/10.18653\/v1\/w17--5220"},{"key":"e_1_3_2_2_31_1","volume-title":"3rd International Conference on Learning Representations, ICLR","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition . In 3rd International Conference on Learning Representations, ICLR 2015 , San Diego, CA , USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds .). http:\/\/arxiv.org\/abs\/1409.1556 Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2017.08.003"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21416"},{"key":"e_1_3_2_2_34_1","volume-title":"2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001","author":"Paul","year":"2001","unstructured":"Paul A. Viola and Michael J. Jones. 2001. Rapid Object Detection using a Boosted Cascade of Simple Features . In 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001 ), with CD-ROM, 8--14 December 2001 , Kauai, HI, USA. IEEE Computer Society, 511--518. https:\/\/doi.org\/10.1109\/CVPR. 2001.990517 10.1109\/CVPR.2001.990517 Paul A. Viola and Michael J. Jones. 2001. Rapid Object Detection using a Boosted Cascade of Simple Features. In 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), with CD-ROM, 8--14 December 2001, Kauai, HI, USA. IEEE Computer Society, 511--518. https:\/\/doi.org\/10.1109\/CVPR.2001.990517"},{"key":"e_1_3_2_2_35_1","unstructured":"WuJie. 2018. Facial Expression Recognition. https:\/\/github.com\/WuJie1010\/Facial-Expression-Recognition.Pytorch .  WuJie. 2018. Facial Expression Recognition. https:\/\/github.com\/WuJie1010\/Facial-Expression-Recognition.Pytorch ."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3133142"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210093"},{"key":"e_1_3_2_2_38_1","volume-title":"Weakly Supervised Coupled Networks for Visual Sentiment Analysis. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018","author":"Yang Jufeng","year":"2018","unstructured":"Jufeng Yang , Dongyu She , Yu-Kun Lai , Paul L. Rosin , and Ming-Hsuan Yang . 2018 . Weakly Supervised Coupled Networks for Visual Sentiment Analysis. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018 , Salt Lake City, UT, USA, June 18--22 , 2018. IEEE Computer Society, 7584--7592. https:\/\/doi.org\/10.1109\/CVPR.2018.00791 10.1109\/CVPR.2018.00791 Jufeng Yang, Dongyu She, Yu-Kun Lai, Paul L. Rosin, and Ming-Hsuan Yang. 2018. Weakly Supervised Coupled Networks for Visual Sentiment Analysis. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. IEEE Computer Society, 7584--7592. https:\/\/doi.org\/10.1109\/CVPR.2018.00791"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/456"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/2887007.2887061"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/3015812.3015857"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.3390\/a9020041"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502069.2502079"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00124"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/780"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351062"},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2723009"},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/503"}],"event":{"name":"ICMR '21: International Conference on Multimedia Retrieval","location":"Taipei Taiwan","acronym":"ICMR '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3463945.3469058","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3463945.3469058","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:15Z","timestamp":1750191135000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3463945.3469058"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,21]]},"references-count":48,"alternative-id":["10.1145\/3463945.3469058","10.1145\/3463945"],"URL":"https:\/\/doi.org\/10.1145\/3463945.3469058","relation":{},"subject":[],"published":{"date-parts":[[2021,8,21]]},"assertion":[{"value":"2021-08-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}