{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T02:26:10Z","timestamp":1767666370920,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":50,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,6,5]],"date-time":"2019-06-05T00:00:00Z","timestamp":1559692800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Leibniz-Gemeinschaft","award":["K68\/2017"],"award-info":[{"award-number":["K68\/2017"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,6,5]]},"DOI":"10.1145\/3323873.3325049","type":"proceedings-article","created":{"date-parts":[[2019,6,10]],"date-time":"2019-06-10T12:10:58Z","timestamp":1560168658000},"page":"168-176","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["Understanding, Categorizing and Predicting Semantic Image-Text Relations"],"prefix":"10.1145","author":[{"given":"Christian","family":"Otto","sequence":"first","affiliation":[{"name":"Leibniz Information Centre for Science and Technology (TIB), Hannover, Germany"}]},{"given":"Matthias","family":"Springstein","sequence":"additional","affiliation":[{"name":"Leibniz Information Centre for Science and Technology (TIB), Hannover, Germany"}]},{"given":"Avishek","family":"Anand","sequence":"additional","affiliation":[{"name":"L3S Research Center, Leibniz Universit\u00e4t Hannover, Hannover, Germany"}]},{"given":"Ralph","family":"Ewerth","sequence":"additional","affiliation":[{"name":"Leibniz Information Centre for Science and Technology (TIB) &amp; L3S Research Center, Leibniz Universit\u00e4t Hannover, Hannover, Germany"}]}],"member":"320","published-online":{"date-parts":[[2019,6,5]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_1_2_1","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings . http:\/\/arxiv.org\/abs\/1409","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural Machine Translation by Jointly Learning to Align and Translate . In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings . http:\/\/arxiv.org\/abs\/1409 .0473 Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings . http:\/\/arxiv.org\/abs\/1409.0473"},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 28--36","author":"Alexander Kotov Saeid","year":"2018","unstructured":"Saeid Balaneshin-kordan and Alexander Kotov . 2018 . Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images . In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 28--36 . Saeid Balaneshin-kordan and Alexander Kotov. 2018. Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 28--36."},{"key":"e_1_3_2_1_4_1","volume-title":"Multimodal machine learning: A survey and taxonomy","author":"Tadas Baltruvs","year":"2018","unstructured":"Tadas Baltruvs aitis, Chaitanya Ahuja , and Louis-Philippe Morency . 2018. Multimodal machine learning: A survey and taxonomy . IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2018 ). Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018)."},{"key":"e_1_3_2_1_5_1","volume-title":"and trans. S. Heath","author":"Barthes Roland","year":"1977","unstructured":"Roland Barthes . 1977. Image-Music-Text , ed. and trans. S. Heath , London : Fontana , Vol. 332 ( 1977 ). Roland Barthes. 1977. Image-Music-Text, ed. and trans. S. Heath, London: Fontana, Vol. 332 (1977)."},{"volume-title":"Text and image: A critical introduction to the visual\/verbal divide","author":"Bateman John","key":"e_1_3_2_1_6_1","unstructured":"John Bateman . 2014. Text and image: A critical introduction to the visual\/verbal divide . Routledge . John Bateman. 2014. Text and image: A critical introduction to the visual\/verbal divide .Routledge."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1515\/9783110479898"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.212"},{"key":"e_1_3_2_1_9_1","unstructured":"Abadi et al. 2015a. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/  Abadi et al. 2015a. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123369"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2021071"},{"key":"e_1_3_2_1_13_1","series-title":"Proceedings of the 15th Conference of the European","volume-title":"Short Papers . 427--431. https:\/\/aclanthology.info\/papers\/E17--2068\/e17--2068","author":"Grave Edouard","year":"2017","unstructured":"Edouard Grave , Tomas Mikolov , Armand Joulin , and Piotr Bojanowski . 2017. Bag of Tricks for Efficient Text Classification . In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 , Valencia, Spain, April 3--7, 2017, Volume 2 : Short Papers . 427--431. https:\/\/aclanthology.info\/papers\/E17--2068\/e17--2068 Edouard Grave, Tomas Mikolov, Armand Joulin, and Piotr Bojanowski. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3--7, 2017, Volume 2: Short Papers . 427--431. https:\/\/aclanthology.info\/papers\/E17--2068\/e17--2068"},{"volume-title":"Halliday's introduction to functional grammar","author":"Kirkwood Halliday Michael Alexander","key":"e_1_3_2_1_14_1","unstructured":"Michael Alexander Kirkwood Halliday and Christian MIM Matthiessen . 2013. Halliday's introduction to functional grammar . Routledge . Michael Alexander Kirkwood Halliday and Christian MIM Matthiessen. 2013. Halliday's introduction to functional grammar .Routledge."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078971.3078991"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_1_18_1","volume-title":"Visual Storytelling. Conference of the North American Chapter of the Association for Computational Linguistics .","author":"Huang Ting-Hao K.","year":"2016","unstructured":"Ting-Hao K. Huang , Francis Ferraro , Nasrin Mostafazadeh , Ishan Misra , Jacob Devlin , Aishwarya Agrawal , Ross Girshick , Xiaodong He , Pushmeet Kohli , Dhruv Batra , 2016 . Visual Storytelling. Conference of the North American Chapter of the Association for Computational Linguistics . Ting-Hao K. Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Aishwarya Agrawal, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, et almbox. 2016. Visual Storytelling. Conference of the North American Chapter of the Association for Computational Linguistics ."},{"key":"e_1_3_2_1_19_1","volume-title":"Automatic Understanding of Image and Video Advertisements. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017","author":"Hussain Zaeem","year":"2017","unstructured":"Zaeem Hussain , Mingda Zhang , Xiaozhong Zhang , Keren Ye , Christopher Thomas , Zuha Agha , Nathan Ong , and Adriana Kovashka . 2017 . Automatic Understanding of Image and Video Advertisements. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 , Honolulu, HI, USA, July 21--26 , 2017. 1100--1110. Zaeem Hussain, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas, Zuha Agha, Nathan Ong, and Adriana Kovashka. 2017. Automatic Understanding of Image and Video Advertisements. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. 1100--1110."},{"key":"e_1_3_2_1_20_1","volume-title":"Proc. NIPS Workshop on Multimodal Machine Learning","volume":"898","author":"Jaques Natasha","year":"2015","unstructured":"Natasha Jaques , Sara Taylor , Akane Sano , and Rosalind Picard . 2015 . Multi-task, multi-kernel learning for estimating individual wellbeing . In Proc. NIPS Workshop on Multimodal Machine Learning , Montreal, Quebec , Vol. 898 . Natasha Jaques, Sara Taylor, Akane Sano, and Rosalind Picard. 2015. Multi-task, multi-kernel learning for estimating individual wellbeing. In Proc. NIPS Workshop on Multimodal Machine Learning, Montreal, Quebec, Vol. 898."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.494"},{"key":"e_1_3_2_1_22_1","unstructured":"Andrej Karpathy Armand Joulin and Fei Fei F Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping. Advances in Neural Information Processing Systems .   Andrej Karpathy Armand Joulin and Fei Fei F Li. 2014. Deep fragment embeddings for bidirectional image sentence mapping. Advances in Neural Information Processing Systems ."},{"key":"e_1_3_2_1_23_1","volume-title":"Educational and Psychological Measurement","volume":"30","author":"Krippendorff Klaus","year":"1970","unstructured":"Klaus Krippendorff . 1970 . Estimating the reliability, systematic error and random error of interval data . Educational and Psychological Measurement , Vol. 30 , 1 (1970). Klaus Krippendorff. 1970. Estimating the reliability, systematic error and random error of interval data. Educational and Psychological Measurement, Vol. 30, 1 (1970)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123366"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911527"},{"volume-title":"European Conference on Computer Vision, David Fleet, Tomas Pajdla","author":"Lin Tsung-Yi","key":"e_1_3_2_1_26_1","unstructured":"Tsung-Yi Lin , Michael Maire , Serge Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1r , and C. Lawrence Zitnick . 2014. Microsoft COCO: Common Objects in Context . In European Conference on Computer Vision, David Fleet, Tomas Pajdla , Bernt Schiele , and Tinne Tuytelaars (Eds.). Springer International Publishing , Cham, 740--755. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740--755."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2013.2285378"},{"key":"e_1_3_2_1_28_1","volume-title":"Visual Communication","volume":"4","author":"Martinec Radan","year":"2005","unstructured":"Radan Martinec and Andrew Salway . 2005 . A system for image-text relations in new (and old) media . Visual Communication , Vol. 4 (2005). Radan Martinec and Andrew Salway. 2005. A system for image-text relations in new (and old) media . Visual Communication, Vol. 4 (2005)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967210"},{"key":"e_1_3_2_1_30_1","volume-title":"Mass","author":"McCloud Scott","year":"1993","unstructured":"Scott McCloud . 1993. Understanding comics: The invisible art. Northampton , Mass ( 1993 ). Scott McCloud. 1993. Understanding comics: The invisible art. Northampton, Mass (1993)."},{"key":"e_1_3_2_1_31_1","unstructured":"Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems .   Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems ."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3206025.3206064"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240712"},{"volume-title":"Handbook of semiotics","author":"N\u00f6th Winfried","key":"e_1_3_2_1_34_1","unstructured":"Winfried N\u00f6th . 1995. Handbook of semiotics . Indiana University Press . Winfried N\u00f6th. 1995. Handbook of semiotics .Indiana University Press."},{"key":"e_1_3_2_1_35_1","unstructured":"My English Pages. 2017--11--23. List of antonyms and opposites. http:\/\/www.myenglishpages.com\/site_php_files\/vocabulary-lesson-opposites.php  My English Pages. 2017--11--23. List of antonyms and opposites. http:\/\/www.myenglishpages.com\/site_php_files\/vocabulary-lesson-opposites.php"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1303"},{"key":"e_1_3_2_1_37_1","volume-title":"Life-long Cross-media Correlation Learning. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM.","author":"Qi Jinwei","year":"2018","unstructured":"Jinwei Qi , Yuxin Peng , and Yunkan Zhuo . 2018 . Life-long Cross-media Correlation Learning. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM. Jinwei Qi, Yuxin Peng, and Yunkan Zhuo. 2018. Life-long Cross-media Correlation Learning. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2984066"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964321"},{"key":"e_1_3_2_1_41_1","volume-title":"Black Holes and White Rabbits : Metaphor Identification with Visual Features . Naacl","author":"Shutova Ekaterina","year":"2016","unstructured":"Ekaterina Shutova , Douwe Kelia , and Jean Maillard . 2016. Black Holes and White Rabbits : Metaphor Identification with Visual Features . Naacl ( 2016 ). Ekaterina Shutova, Douwe Kelia, and Jean Maillard. 2016. Black Holes and White Rabbits : Metaphor Identification with Visual Features . Naacl (2016)."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.895972"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"crossref","unstructured":"Christian Szegedy Sergey Ioffe Vincent Vanhoucke and Alexander A Alemi. 2017. Inception-v4 Inception-ResNet and the Impact of Residual Connections on Learning. AAAI .  Christian Szegedy Sergey Ioffe Vincent Vanhoucke and Alexander A Alemi. 2017. Inception-v4 Inception-ResNet and the Impact of Residual Connections on Learning. AAAI .","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"e_1_3_2_1_44_1","volume-title":"Proceedings of the 33rd International Systemic Functional Congress. 1165--1205","author":"Unsworth Len","year":"2007","unstructured":"Len Unsworth . 2007 . Image\/text relations and intersemiosis: Towards multimodal text description for multiliteracies education . In Proceedings of the 33rd International Systemic Functional Congress. 1165--1205 . Len Unsworth. 2007. Image\/text relations and intersemiosis: Towards multimodal text description for multiliteracies education. In Proceedings of the 33rd International Systemic Functional Congress. 1165--1205."},{"volume-title":"Introducing Social Semiotics","author":"Leeuwen Theo Van","key":"e_1_3_2_1_45_1","unstructured":"Theo Van Leeuwen . 2005. Introducing Social Semiotics . Psychology Press . Theo Van Leeuwen. 2005. Introducing Social Semiotics .Psychology Press."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3133142"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3206025.3206033"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Zichao Yang Diyi Yang Chris Dyer Xiaodong He Alexander J Smola and Eduard H Hovy. 2016. Hierarchical Attention Networks for Document Classification. North American Chapter of the Association for Computational Linguistics: Human Language Technologies .  Zichao Yang Diyi Yang Chris Dyer Xiaodong He Alexander J Smola and Eduard H Hovy. 2016. Hierarchical Attention Networks for Document Classification. North American Chapter of the Association for Computational Linguistics: Human Language Technologies .","DOI":"10.18653\/v1\/N16-1174"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2012.2188783"},{"key":"e_1_3_2_1_50_1","volume-title":"Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text. In British Machine Vision Conference 2018, BMVC 2018","author":"Zhang Mingda","year":"2018","unstructured":"Mingda Zhang , Rebecca Hwa , and Adriana Kovashka . 2018 . Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text. In British Machine Vision Conference 2018, BMVC 2018 , Northumbria University, Newcastle, UK, September 3--6 , 2018 . 8. http:\/\/bmvc2018.org\/contents\/papers\/0228.pdf Mingda Zhang, Rebecca Hwa, and Adriana Kovashka. 2018. Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text. In British Machine Vision Conference 2018, BMVC 2018, Northumbria University, Newcastle, UK, September 3--6, 2018 . 8. http:\/\/bmvc2018.org\/contents\/papers\/0228.pdf"}],"event":{"name":"ICMR '19: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Ottawa ON Canada","acronym":"ICMR '19"},"container-title":["Proceedings of the 2019 on International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3323873.3325049","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3323873.3325049","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:12Z","timestamp":1750204452000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3323873.3325049"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,5]]},"references-count":50,"alternative-id":["10.1145\/3323873.3325049","10.1145\/3323873"],"URL":"https:\/\/doi.org\/10.1145\/3323873.3325049","relation":{},"subject":[],"published":{"date-parts":[[2019,6,5]]},"assertion":[{"value":"2019-06-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}