{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T06:01:03Z","timestamp":1777615263318,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":42,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T00:00:00Z","timestamp":1602460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100006512","name":"Nanyang Technological University","doi-asserted-by":"publisher","award":["NTU?ACE2020-01"],"award-info":[{"award-number":["NTU?ACE2020-01"]}],"id":[{"id":"10.13039\/501100006512","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Natural Science Foundation of China","award":["61971457, 7191101302"],"award-info":[{"award-number":["61971457, 7191101302"]}]},{"name":"Energy Market Authority of Singapore","award":["NRF2017EWT-EP003-023"],"award-info":[{"award-number":["NRF2017EWT-EP003-023"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3394171.3413582","type":"proceedings-article","created":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T12:27:35Z","timestamp":1602505655000},"page":"430-438","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning"],"prefix":"10.1145","author":[{"given":"Huaizheng","family":"Zhang","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yong","family":"Luo","sequence":"additional","affiliation":[{"name":"Nanyang Technological University &amp; Peng Cheng Laboratory, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qiming","family":"Ai","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yonggang","family":"Wen","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Han","family":"Hu","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"crossref","unstructured":"Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR. 6077--6086.  Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR. 6077--6086.","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"crossref","unstructured":"Javad Azimi Ruofei Zhang Yang Zhou Vidhya Navalpakkam Jianchang Mao and Xiaoli Fern. 2012. Visual appearance of display ads and its effect on click through rate. InCIKM. ACM 495--504.  Javad Azimi Ruofei Zhang Yang Zhou Vidhya Navalpakkam Jianchang Mao and Xiaoli Fern. 2012. Visual appearance of display ads and its effect on click through rate. InCIKM. ACM 495--504.","DOI":"10.1145\/2396761.2396826"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Piotr Bojanowski Edouard Grave Armand Joulin and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics5 (2017) 135--146.  Piotr Bojanowski Edouard Grave Armand Joulin and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics5 (2017) 135--146.","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350888"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"crossref","unstructured":"Zhao-Min Chen Xiu-Shen Wei Peng Wang and Yanwen Guo. 2019. Multi-Label Image Recognition with Graph Convolutional Networks. In CVPR. 5177--5186.  Zhao-Min Chen Xiu-Shen Wei Peng Wang and Yanwen Guo. 2019. Multi-Label Image Recognition with Graph Convolutional Networks. In CVPR. 5177--5186.","DOI":"10.1109\/CVPR.2019.00532"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"crossref","unstructured":"Haibin Cheng Roelof van Zwol Javad Azimi Eren Manavoglu Ruofei Zhang Yang Zhou and Vidhya Navalpakkam. 2012. Multimedia features for click prediction of new ads in display advertising. In SIGKDD. ACM 777--785.  Haibin Cheng Roelof van Zwol Javad Azimi Eren Manavoglu Ruofei Zhang Yang Zhou and Vidhya Navalpakkam. 2012. Multimedia features for click prediction of new ads in display advertising. In SIGKDD. ACM 777--785.","DOI":"10.1145\/2339530.2339652"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964326"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Jia Deng Wei Dong Richard Socher Li-Jia Li Kai Li and Li Fei-Fei. 2009. Imagenet:A large-scale hierarchical image database. In CVPR. Ieee 248--255.  Jia Deng Wei Dong Richard Socher Li-Jia Li Kai Li and Li Fei-Fei. 2009. Imagenet:A large-scale hierarchical image database. In CVPR. Ieee 248--255.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2006.03.002"},{"key":"e_1_3_2_2_10_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Gao Huang Zhuang Liu Laurens Van Der Maaten and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR. 4700--4708.  Gao Huang Zhuang Liu Laurens Van Der Maaten and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In CVPR. 4700--4708.","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"crossref","unstructured":"Zaeem Hussain Mingda Zhang Xiaozhong Zhang Keren Ye Christopher Thomas Zuha Agha Nathan Ong and Adriana Kovashka. 2017. Automatic understanding of image and video advertisements. InCVPR. 1705--1715.  Zaeem Hussain Mingda Zhang Xiaozhong Zhang Keren Ye Christopher Thomas Zuha Agha Nathan Ong and Adriana Kovashka. 2017. Automatic understanding of image and video advertisements. InCVPR. 1705--1715.","DOI":"10.1109\/CVPR.2017.123"},{"key":"e_1_3_2_2_13_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).","author":"Kingma Diederik P","year":"2014"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Ranjay Krishna Yuke Zhu Oliver Groth Justin Johnson Kenji Hata Joshua Kravitz Stephanie Chen Yannis Kalantidis Li-Jia Li David A Shamma etal2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV123 1 (2017) 32--73.  Ranjay Krishna Yuke Zhu Oliver Groth Justin Johnson Kenji Hata Joshua Kravitz Stephanie Chen Yannis Kalantidis Li-Jia Li David A Shamma et al.2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV123 1 (2017) 32--73.","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240649"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"crossref","unstructured":"Lemao Liu Andrew Finch Masao Utiyama and Eiichiro Sumita. 2016. Agreement on target-bidirectional LSTMs for sequence-to-sequence learning. In AAAI.  Lemao Liu Andrew Finch Masao Utiyama and Eiichiro Sumita. 2016. Agreement on target-bidirectional LSTMs for sequence-to-sequence learning. In AAAI.","DOI":"10.1609\/aaai.v30i1.10327"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2421309"},{"key":"e_1_3_2_2_18_1","first-page":"414","article-title":"Large margin multi-modal multi-task feature extraction for image classification","volume":"1","author":"Luo Yong","year":"2015","journal-title":"IEEE Transactions on Image Processing25"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"crossref","unstructured":"Rishi Madhok Shashank Mujumdar Nitin Gupta and Sameep Mehta. 2018. Semantic Understanding for Contextual In-Video Advertising. In AAAI.  Rishi Madhok Shashank Mujumdar Nitin Gupta and Sameep Mehta. 2018. Semantic Understanding for Contextual In-Video Advertising. In AAAI.","DOI":"10.1609\/aaai.v32i1.12133"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"crossref","unstructured":"Daniel McDuff Rana El Kaliouby Jeffrey F Cohn and Rosalind W Picard. 2014. Predicting ad liking and purchase intent: Large-scale analysis of facial responses to ads.IEEE TAC6 3 (2014) 223--235.  Daniel McDuff Rana El Kaliouby Jeffrey F Cohn and Rosalind W Picard. 2014. Predicting ad liking and purchase intent: Large-scale analysis of facial responses to ads.IEEE TAC6 3 (2014) 223--235.","DOI":"10.1109\/TAFFC.2014.2384198"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Tao Mei Xian-Sheng Hua Linjun Yang and Shipeng Li. 2007. VideoSense:towards effective online video advertising. In ACM MM. ACM 1075--1084.  Tao Mei Xian-Sheng Hua Linjun Yang and Shipeng Li. 2007. VideoSense:towards effective online video advertising. In ACM MM. ACM 1075--1084.","DOI":"10.1145\/1291233.1291467"},{"key":"e_1_3_2_2_22_1","volume-title":"Image Sense: Towards contextual image advertising.ACM TOMM8, 1","author":"Mei Tao","year":"2012"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00173"},{"key":"e_1_3_2_2_24_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in PyTorch. NIPS-W(2017).  Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in PyTorch. NIPS-W(2017)."},{"key":"e_1_3_2_2_25_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn:Towards real-time object detection with region proposal networks. In NIPS. 91--99.  Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn:Towards real-time object detection with region proposal networks. In NIPS. 91--99."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"crossref","unstructured":"Juan M S\u00e1nchez Xavier Binefa and Jordi Vitri\u00e0. 2002. Shot partitioning based recognition of tv commercials.MTAP18 3 (2002) 233--247.  Juan M S\u00e1nchez Xavier Binefa and Jordi Vitri\u00e0. 2002. Shot partitioning based recognition of tv commercials.MTAP18 3 (2002) 233--247.","DOI":"10.1023\/A:1019996817159"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291338"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502082"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Amanpreet Singh Vivek Natarajan Meet Shah Yu Jiang Xinlei Chen Dhruv Batra Devi Parikh and Marcus Rohrbach. 2019. Towards vqa models that can read. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8317--8326.  Amanpreet Singh Vivek Natarajan Meet Shah Yu Jiang Xinlei Chen Dhruv Batra Devi Parikh and Marcus Rohrbach. 2019. Towards vqa models that can read. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8317--8326.","DOI":"10.1109\/CVPR.2019.00851"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2007.4376991"},{"key":"e_1_3_2_2_31_1","unstructured":"Statista 2020. Advertising revenue of Google from 2001 to 2019. https:\/\/www.statista.com\/statistics\/266249\/advertising-revenue-of-google\/. Accessed: 2020-05--23.  Statista 2020. Advertising revenue of Google from 2001 to 2019. https:\/\/www.statista.com\/statistics\/266249\/advertising-revenue-of-google\/. Accessed: 2020-05--23."},{"key":"e_1_3_2_2_32_1","unstructured":"Statista 2020. Facebook's advertising revenue worldwide from 2009 to 2019. https:\/\/www.statista.com\/statistics\/271258\/facebooks-advertising-revenue-worldwide\/. Accessed: 2020-05--23.  Statista 2020. Facebook's advertising revenue worldwide from 2009 to 2019. https:\/\/www.statista.com\/statistics\/271258\/facebooks-advertising-revenue-worldwide\/. Accessed: 2020-05--23."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Nikhita Vedula Wei Sun Hyunhwan Lee Harsh Gupta Mitsunori Ogihara Joseph Johnson Gang Ren and Srinivasan Parthasarathy. 2017. Multimodal Content Analysis for Effective Advertisements on YouTube. In ICDM. IEEE 1123--1128.  Nikhita Vedula Wei Sun Hyunhwan Lee Harsh Gupta Mitsunori Ogihara Joseph Johnson Gang Ren and Srinivasan Parthasarathy. 2017. Multimodal Content Analysis for Effective Advertisements on YouTube. In ICDM. IEEE 1123--1128.","DOI":"10.1109\/ICDM.2017.149"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Jiang Wang Yi Yang Junhua Mao Zhiheng Huang Chang Huang and Wei Xu.2016. Cnn-rnn: A unified framework for multi-label image classification. In CVPR. 2285--2294.  Jiang Wang Yi Yang Junhua Mao Zhiheng Huang Chang Huang and Wei Xu.2016. Cnn-rnn: A unified framework for multi-label image classification. In CVPR. 2285--2294.","DOI":"10.1109\/CVPR.2016.251"},{"key":"e_1_3_2_2_35_1","volume-title":"Salad: A multimodal approach for contextual video advertising","author":"Xiang Chen","year":"2015"},{"key":"e_1_3_2_2_36_1","volume-title":"CAVVA: Computational affective video-in-video advertising","author":"Yadati Karthik","year":"2013"},{"key":"e_1_3_2_2_37_1","unstructured":"Chih-Kuan Yeh Wei-Chieh Wu Wei-Jen Ko and Yu-Chiang Frank Wang. 2017. Learning deep latent space for multi-label classification. In AAAI.  Chih-Kuan Yeh Wei-Chieh Wu Wei-Jen Ko and Yu-Chiang Frank Wang. 2017. Learning deep latent space for multi-label classification. In AAAI."},{"key":"e_1_3_2_2_38_1","unstructured":"Zheng-Jun Zha Daqing Liu Hanwang Zhang Yongdong Zhang and Feng Wu. 2019. Context-aware visual policy network for fine-grained image captioning. IEEE transactions on pattern analysis and machine intelligence(2019).  Zheng-Jun Zha Daqing Liu Hanwang Zhang Yongdong Zhang and Feng Wu. 2019. Context-aware visual policy network for fine-grained image captioning. IEEE transactions on pattern analysis and machine intelligence(2019)."},{"key":"e_1_3_2_2_39_1","unstructured":"Zheng-Jun Zha Jiawei Liu Di Chen and Feng Wu. 2020. Adversarial attribute-text embedding for person search with natural language query.IEEE Transactions on Multimedia(2020).  Zheng-Jun Zha Jiawei Liu Di Chen and Feng Wu. 2020. Adversarial attribute-text embedding for person search with natural language query.IEEE Transactions on Multimedia(2020)."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"crossref","unstructured":"Huaizheng Zhang Linsen Dong Guanyu Gao Han Hu Yonggang Wen and Kyle Guan. 2020. DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction. IEEE Transactions on Multimedia(2020).  Huaizheng Zhang Linsen Dong Guanyu Gao Han Hu Yonggang Wen and Kyle Guan. 2020. DeepQoE: A Multimodal Learning Framework for Video Quality of Experience (QoE) Prediction. IEEE Transactions on Multimedia(2020).","DOI":"10.1109\/TMM.2020.2973828"},{"key":"e_1_3_2_2_41_1","volume-title":"Hysia: Serving DNN-Based Video-to-Retail Applications in Cloud. arXiv preprint arXiv:2006.05117(2020).","author":"Zhang Huaizheng","year":"2020"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2011.40"}],"event":{"name":"MM '20: The 28th ACM International Conference on Multimedia","location":"Seattle WA USA","acronym":"MM '20","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 28th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413582","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394171.3413582","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:14Z","timestamp":1750193234000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413582"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":42,"alternative-id":["10.1145\/3394171.3413582","10.1145\/3394171"],"URL":"https:\/\/doi.org\/10.1145\/3394171.3413582","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}