{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T09:45:31Z","timestamp":1766137531984,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":56,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,21]],"date-time":"2020-10-21T00:00:00Z","timestamp":1603238400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,21]]},"DOI":"10.1145\/3382507.3418821","type":"proceedings-article","created":{"date-parts":[[2020,10,22]],"date-time":"2020-10-22T10:04:34Z","timestamp":1603361074000},"page":"387-396","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["MORSE: MultimOdal sentiment analysis for Real-life SEttings"],"prefix":"10.1145","author":[{"given":"Yiqun","family":"Yao","sequence":"first","affiliation":[{"name":"University of Michigan-Ann Arbor, Ann Arbor, MI, USA"}]},{"given":"Ver\u00f3nica","family":"P\u00e9rez-Rosas","sequence":"additional","affiliation":[{"name":"University of Michigan-Ann Arbor, Ann Arbor, MI, USA"}]},{"given":"Mohamed","family":"Abouelenien","sequence":"additional","affiliation":[{"name":"University of Michigan-Dearborn, Dearborn, MI, USA"}]},{"given":"Mihai","family":"Burzo","sequence":"additional","affiliation":[{"name":"University of Michigan-Flint, Flint, MI, USA"}]}],"member":"320","published-online":{"date-parts":[[2020,10,22]]},"reference":[{"volume-title":"Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012)","author":"Abouelenien M.","key":"e_1_3_2_2_1_1","unstructured":"M. Abouelenien and X. Yuan . 2012. SampleBoost: Improving boosting performance by destabilizing weak learners based on weighted error analysis . In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) . 585--588. M. Abouelenien and X. Yuan. 2012. SampleBoost: Improving boosting performance by destabilizing weak learners based on weighted error analysis. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). 585--588."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372422.3373592"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_3_2_2_4_1","volume-title":"Jamie Ryan Kiros, and Geoffrey E Hinton","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba , Jamie Ryan Kiros, and Geoffrey E Hinton . 2016 . Layer normalization. arXiv preprint arXiv:1607.06450 (2016). Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016)."},{"key":"e_1_3_2_2_5_1","volume-title":"Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 ( 2014 ). Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1633"},{"key":"e_1_3_2_2_7_1","volume-title":"IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42, 4","author":"Busso Carlos","year":"2008","unstructured":"Carlos Busso , Murtaza Bulut , Chi-Chun Lee , Abe Kazemzadeh , Emily Mower , Samuel Kim , Jeannette N Chang , Sungbok Lee , and Shrikanth S Narayanan . 2008 . IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42, 4 (2008), 335. Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42, 4 (2008), 335."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1021\/ci0341161"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2018.00020"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.121"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.321"},{"volume-title":"COVAREP'A collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp)","author":"Degottex Gilles","key":"e_1_3_2_2_13_1","unstructured":"Gilles Degottex , John Kane , Thomas Drugman , Tuomo Raitio , and Stefan Scherer . 2014. COVAREP'A collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp) . IEEE , 960--964. Gilles Degottex, John Kane, Thomas Drugman, Tuomo Raitio, and Stefan Scherer. 2014. COVAREP'A collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 960--964."},{"key":"e_1_3_2_2_14_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_2_15_1","volume-title":"Supervised Multimodal Bitransformers for Classifying Images and Text. arXiv preprint arXiv:1909.02950","author":"Davide Testuggine Douwe Kiela Hamed Firooz","year":"2019","unstructured":"Hamed Firooz Davide Testuggine Douwe Kiela , Suvrat Bhooshan . 2019. Supervised Multimodal Bitransformers for Classifying Images and Text. arXiv preprint arXiv:1909.02950 ( 2019 ). Hamed Firooz Davide Testuggine Douwe Kiela, Suvrat Bhooshan. 2019. Supervised Multimodal Bitransformers for Classifying Images and Text. arXiv preprint arXiv:1909.02950 (2019)."},{"key":"e_1_3_2_2_16_1","unstructured":"Faceplusplus. [n.d.]. Face Detection API. https:\/\/www.faceplusplus.com\/facedetection\/  Faceplusplus. [n.d.]. Face Detection API. https:\/\/www.faceplusplus.com\/facedetection\/"},{"key":"e_1_3_2_2_17_1","volume-title":"Facial action coding system: a technique for the measurement of facial movement. Palo Alto 3","author":"Friesen E","year":"1978","unstructured":"E Friesen and Paul Ekman . 1978. Facial action coding system: a technique for the measurement of facial movement. Palo Alto 3 ( 1978 ). E Friesen and Paul Ekman. 1978. Facial action coding system: a technique for the measurement of facial movement. Palo Alto 3 (1978)."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1280"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1193"},{"key":"e_1_3_2_2_20_1","volume-title":"Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580","author":"Hinton Geoffrey E","year":"2012","unstructured":"Geoffrey E Hinton , Nitish Srivastava , Alex Krizhevsky , Ilya Sutskever , and Ruslan R Salakhutdinov . 2012. Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580 ( 2012 ). Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)."},{"key":"e_1_3_2_2_21_1","volume-title":"Long short-term memory. Neural computation 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation 9, 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_2_22_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_2_23_1","volume-title":"Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557","author":"Li Liunian Harold","year":"2019","unstructured":"Liunian Harold Li , Mark Yatskar , Da Yin , Cho-Jui Hsieh , and Kai-Wei Chang . 2019 . Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019). Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, and Kai-Wei Chang. 2019. Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019)."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1267"},{"key":"e_1_3_2_2_25_1","volume-title":"Multimodal Language Analysis with Recurrent Multistage Fusion. In EMNLP 2018: 2018 Conference on Empirical Methods in Natural Language Processing. 150--161","author":"Liang Paul Pu","year":"2018","unstructured":"Paul Pu Liang , Ziyin Liu , AmirAli Bagher Zadeh , and Louis-Philippe Morency . 2018 . Multimodal Language Analysis with Recurrent Multistage Fusion. In EMNLP 2018: 2018 Conference on Empirical Methods in Natural Language Processing. 150--161 . Paul Pu Liang, Ziyin Liu, AmirAli Bagher Zadeh, and Louis-Philippe Morency. 2018. Multimodal Language Analysis with Recurrent Multistage Fusion. In EMNLP 2018: 2018 Conference on Empirical Methods in Natural Language Processing. 150--161."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S19-2011"},{"key":"e_1_3_2_2_27_1","volume-title":"Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101","author":"Loshchilov Ilya","year":"2017","unstructured":"Ilya Loshchilov and Frank Hutter . 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 ( 2017 ). Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)."},{"key":"e_1_3_2_2_28_1","volume-title":"Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22, 3","author":"McHugh Mary L","year":"2012","unstructured":"Mary L McHugh . 2012. Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22, 3 ( 2012 ), 276--282. Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica: Biochemia medica 22, 3 (2012), 276--282."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/3104322.3104425"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJKESDP.2011.039875"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_3_2_2_32_1","volume-title":"Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. arXiv preprint arXiv:1906.05474","author":"Peng Yifan","year":"2019","unstructured":"Yifan Peng , Shankai Yan , and Zhiyong Lu. 2019. Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. arXiv preprint arXiv:1906.05474 ( 2019 ). Yifan Peng, Shankai Yan, and Zhiyong Lu. 2019. Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. arXiv preprint arXiv:1906.05474 (2019)."},{"key":"e_1_3_2_2_33_1","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 973--982","author":"P\u00e9rez-Rosas Ver\u00f3nica","year":"2013","unstructured":"Ver\u00f3nica P\u00e9rez-Rosas , Rada Mihalcea , and Louis-Philippe Morency . 2013 . Utterance-level multimodal sentiment analysis . In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 973--982 . Ver\u00f3nica P\u00e9rez-Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Utterance-level multimodal sentiment analysis. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 973--982."},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_2_2_35_1","volume-title":"Paul Pu Liang, and Barnabas Poczos","author":"Pham Hai","year":"2018","unstructured":"Hai Pham , Thomas Manzini , Paul Pu Liang, and Barnabas Poczos . 2018 . Seq2seq2sentiment: Multimodal sequence to sequence models for sentiment analysis. arXiv preprint arXiv:1807.03915 (2018). Hai Pham, Thomas Manzini, Paul Pu Liang, and Barnabas Poczos. 2018. Seq2seq2sentiment: Multimodal sequence to sequence models for sentiment analysis. arXiv preprint arXiv:1807.03915 (2018)."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1050"},{"key":"e_1_3_2_2_37_1","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford , Jeffrey Wu , Rewon Child , David Luan , Dario Amodei , and Ilya Sutskever . 2019 . Language models are unsupervised multitask learners . OpenAI Blog 1 , 8 (2019), 9 . Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_2_2_38_1","volume-title":"Amir Zadeh, Louis-Philippe Morency, and Mohammed Ehsan Hoque.","author":"Rahman Wasifur","year":"2019","unstructured":"Wasifur Rahman , Md Kamrul Hasan , Amir Zadeh, Louis-Philippe Morency, and Mohammed Ehsan Hoque. 2019 . M-BERT: Injecting Multimodal Information in the BERT Structure . arXiv preprint arXiv:1908.05787 (2019). Wasifur Rahman, Md Kamrul Hasan, Amir Zadeh, Louis-Philippe Morency, and Mohammed Ehsan Hoque. 2019. M-BERT: Injecting Multimodal Information in the BERT Structure. arXiv preprint arXiv:1908.05787 (2019)."},{"key":"e_1_3_2_2_39_1","volume-title":"Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect-target sentiment classification. arXiv preprint arXiv:1908.11860","author":"Rietzler Alexander","year":"2019","unstructured":"Alexander Rietzler , Sebastian Stabinger , Paul Opitz , and Stefan Engl . 2019. Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect-target sentiment classification. arXiv preprint arXiv:1908.11860 ( 2019 ). Alexander Rietzler, Sebastian Stabinger, Paul Opitz, and Stefan Engl. 2019. Adapt or get left behind: Domain adaptation through bert language model finetuning for aspect-target sentiment classification. arXiv preprint arXiv:1908.11860 (2019)."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCA.2009.2029559"},{"key":"e_1_3_2_2_41_1","unstructured":"Sainbayar Sukhbaatar Jason Weston Rob Fergus etal 2015. End-to-end memory networks. In Advances in neural information processing systems. 2440--2448.  Sainbayar Sukhbaatar Jason Weston Rob Fergus et al. 2015. End-to-end memory networks. In Advances in neural information processing systems. 2440--2448."},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00756"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/3298023.3298188"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1514"},{"key":"e_1_3_2_2_45_1","volume-title":"Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov.","author":"Hubert Tsai Yao-Hung","year":"2018","unstructured":"Yao-Hung Hubert Tsai , Paul Pu Liang , Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2018 . Learning factorized multimodal representations. arXiv preprint arXiv:1806.06176 (2018). Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2018. Learning factorized multimodal representations. arXiv preprint arXiv:1806.06176 (2018)."},{"volume-title":"Nonlinear Modeling","author":"Vapnik Vladimir","key":"e_1_3_2_2_46_1","unstructured":"Vladimir Vapnik . 1998. The support vector method of function estimation . In Nonlinear Modeling . Springer , 55--85. Vladimir Vapnik. 1998. The support vector method of function estimation. In Nonlinear Modeling. Springer, 55--85."},{"key":"e_1_3_2_2_47_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_3_2_2_48_1","volume-title":"5th International Conference on Language Resources and Evaluation (LREC","author":"Wittenburg Peter","year":"2006","unstructured":"Peter Wittenburg , Hennie Brugman , Albert Russel , Alex Klassmann , and Han Sloetjes . 2006 . ELAN: a professional framework for multimodality research . In 5th International Conference on Language Resources and Evaluation (LREC 2006). 1556--1559. Peter Wittenburg, Hennie Brugman, Albert Russel, Alex Klassmann, and Han Sloetjes. 2006. ELAN: a professional framework for multimodality research. In 5th International Conference on Language Resources and Evaluation (LREC 2006). 1556--1559."},{"key":"e_1_3_2_2_49_1","volume-title":"International conference on machine learning. 2048--2057","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhudinov , Rich Zemel , and Yoshua Bengio . 2015 . Show, attend and tell: Neural image caption generation with visual attention . In International conference on machine learning. 2048--2057 . Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057."},{"key":"e_1_3_2_2_50_1","volume-title":"Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems. 5754--5764.","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang , Zihang Dai , Yiming Yang , Jaime Carbonell , Russ R Salakhutdinov , and Quoc V Le . 2019 . Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems. 5754--5764. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems. 5754--5764."},{"key":"e_1_3_2_2_51_1","unstructured":"Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey Jeff Klingner Apurva Shah Melvin Johnson Xiaobing Liu Lukasz Kaiser Stephan Gouws Yoshikiyo Kato Taku Kudo Hideto Kazawa Keith Stevens George Kurian Nishant Patil Wei Wang Cliff Young Jason Smith Jason Riesa Alex Rudnick Oriol Vinyals Greg Corrado Macduff Hughes Jeffrey Dean Yonghui Wu Mike Schuster. 2019. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144 (2019).  Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey Jeff Klingner Apurva Shah Melvin Johnson Xiaobing Liu Lukasz Kaiser Stephan Gouws Yoshikiyo Kato Taku Kudo Hideto Kazawa Keith Stevens George Kurian Nishant Patil Wei Wang Cliff Young Jason Smith Jason Riesa Alex Rudnick Oriol Vinyals Greg Corrado Macduff Hughes Jeffrey Dean Yonghui Wu Mike Schuster. 2019. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144 (2019)."},{"key":"e_1_3_2_2_52_1","volume-title":"Memory Fusion Network for Multi-view Sequential Learning. In AAAI-18 AAAI Conference on Artificial Intelligence. 5634-- 5641","author":"Zadeh Amir","year":"2018","unstructured":"Amir Zadeh , Paul Pu Liang , Navonil Mazumder , Soujanya Poria , Erik Cambria , and Louisphilippe Morency . 2018 . Memory Fusion Network for Multi-view Sequential Learning. In AAAI-18 AAAI Conference on Artificial Intelligence. 5634-- 5641 . Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louisphilippe Morency. 2018. Memory Fusion Network for Multi-view Sequential Learning. In AAAI-18 AAAI Conference on Artificial Intelligence. 5634-- 5641."},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2016.94"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1208"},{"key":"e_1_3_2_2_55_1","volume-title":"Unified vision-language pre-training for image captioning and vqa. arXiv preprint arXiv:1909.11059","author":"Zhou Luowei","year":"2019","unstructured":"Luowei Zhou , Hamid Palangi , Lei Zhang , Houdong Hu , Jason J Corso , and Jianfeng Gao . 2019. Unified vision-language pre-training for image captioning and vqa. arXiv preprint arXiv:1909.11059 ( 2019 ). Luowei Zhou, Hamid Palangi, Lei Zhang, Houdong Hu, Jason J Corso, and Jianfeng Gao. 2019. Unified vision-language pre-training for image captioning and vqa. arXiv preprint arXiv:1909.11059 (2019)."},{"key":"e_1_3_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.11"}],"event":{"name":"ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Virtual Event Netherlands","acronym":"ICMI '20"},"container-title":["Proceedings of the 2020 International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3382507.3418821","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3382507.3418821","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:38:27Z","timestamp":1750199907000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3382507.3418821"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,21]]},"references-count":56,"alternative-id":["10.1145\/3382507.3418821","10.1145\/3382507"],"URL":"https:\/\/doi.org\/10.1145\/3382507.3418821","relation":{},"subject":[],"published":{"date-parts":[[2020,10,21]]},"assertion":[{"value":"2020-10-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}