{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,6]],"date-time":"2025-11-06T11:45:01Z","timestamp":1762429501408,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3548034","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:42:46Z","timestamp":1665416566000},"page":"769-778","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Clustering Generative Adversarial Networks for Story Visualization"],"prefix":"10.1145","author":[{"given":"Bowen","family":"Li","sequence":"first","affiliation":[{"name":"University of Oxford, Oxford, United Kingdom"}]},{"given":"Philip H. S.","family":"Torr","sequence":"additional","affiliation":[{"name":"University of Oxford, Oxford, United Kingdom"}]},{"given":"Thomas","family":"Lukasiewicz","sequence":"additional","affiliation":[{"name":"TU Wien &amp;University of Oxford, Oxford, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"9758","article-title":"Self-supervised learning by cross-modal audiovideo clustering","volume":"33","author":"Alwassel Humam","year":"2020","unstructured":"Humam Alwassel , Dhruv Mahajan , Bruno Korbar , Lorenzo Torresani , Bernard Ghanem , and Du Tran . 2020 . Self-supervised learning by cross-modal audiovideo clustering . Advances in Neural Information Processing Systems 33 (2020), 9758 -- 9770 . Humam Alwassel, Dhruv Mahajan, Bruno Korbar, Lorenzo Torresani, Bernard Ghanem, and Du Tran. 2020. Self-supervised learning by cross-modal audiovideo clustering. Advances in Neural Information Processing Systems 33 (2020), 9758--9770.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_2_1","first-page":"4660","article-title":"Labelling unlabelled videos from scratch with multi-modal self-supervision","volume":"33","author":"Asano Yuki","year":"2020","unstructured":"Yuki Asano , Mandela Patrick , Christian Rupprecht , and Andrea Vedaldi . 2020 . Labelling unlabelled videos from scratch with multi-modal self-supervision . Advances in Neural Information Processing Systems 33 (2020), 4660 -- 4671 . Yuki Asano, Mandela Patrick, Christian Rupprecht, and Andrea Vedaldi. 2020. Labelling unlabelled videos from scratch with multi-modal self-supervision. Advances in Neural Information Processing Systems 33 (2020), 4660--4671.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_3_1","volume-title":"Selflabelling via simultaneous clustering and representation learning. arXiv preprint arXiv:1911.05371","author":"Asano Yuki Markus","year":"2019","unstructured":"Yuki Markus Asano , Christian Rupprecht , and Andrea Vedaldi . 2019. Selflabelling via simultaneous clustering and representation learning. arXiv preprint arXiv:1911.05371 ( 2019 ). Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. 2019. Selflabelling via simultaneous clustering and representation learning. arXiv preprint arXiv:1911.05371 (2019)."},{"key":"e_1_3_2_1_4_1","first-page":"2","article-title":"Conditional GAN with Discriminative Filter Generation for Text-to- Video Synthesis","volume":"1","author":"Balaji Yogesh","year":"2019","unstructured":"Yogesh Balaji , Martin Renqiang Min , Bing Bai , Rama Chellappa , and Hans Peter Graf . 2019 . Conditional GAN with Discriminative Filter Generation for Text-to- Video Synthesis .. In IJCAI , Vol. 1. 2 . Yogesh Balaji, Martin Renqiang Min, Bing Bai, Rama Chellappa, and Hans Peter Graf. 2019. Conditional GAN with Discriminative Filter Generation for Text-to- Video Synthesis.. In IJCAI, Vol. 1. 2.","journal-title":"IJCAI"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_9"},{"key":"e_1_3_2_1_6_1","first-page":"9912","article-title":"Unsupervised learning of visual features by contrasting cluster assignments","volume":"33","author":"Caron Mathilde","year":"2020","unstructured":"Mathilde Caron , Ishan Misra , Julien Mairal , Priya Goyal , Piotr Bojanowski , and Armand Joulin . 2020 . Unsupervised learning of visual features by contrasting cluster assignments . Advances in Neural Information Processing Systems 33 (2020), 9912 -- 9924 . Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems 33 (2020), 9912--9924.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00791"},{"key":"e_1_3_2_1_8_1","volume-title":"International conference on machine learning. PMLR, 1597--1607","author":"Chen Ting","year":"2020","unstructured":"Ting Chen , Simon Kornblith , Mohammad Norouzi , and Geoffrey Hinton . 2020 . A simple framework for contrastive learning of visual representations . In International conference on machine learning. PMLR, 1597--1607 . Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00520"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.608"},{"key":"e_1_3_2_1_11_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.  Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680."},{"key":"e_1_3_2_1_12_1","volume-title":"International Conference on Machine Learning. PMLR, 1462--1471","author":"Gregor Karol","year":"2015","unstructured":"Karol Gregor , Ivo Danihelka , Alex Graves , Danilo Rezende , and Daan Wierstra . 2015 . Draw: A recurrent neural network for image generation . In International Conference on Machine Learning. PMLR, 1462--1471 . Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. 2015. Draw: A recurrent neural network for image generation. In International Conference on Machine Learning. PMLR, 1462--1471."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01237-3_37"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"e_1_3_2_1_15_1","unstructured":"Martin Heusel Hubert Ramsauer Thomas Unterthiner Bernhard Nessler and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems. 6626--6637.  Martin Heusel Hubert Ramsauer Thomas Unterthiner Bernhard Nessler and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems. 6626--6637."},{"key":"e_1_3_2_1_16_1","volume-title":"Semantic object accuracy for generative Text-to-Image synthesis. arXiv preprint arXiv:1910.13321","author":"Hinz Tobias","year":"2019","unstructured":"Tobias Hinz , Stefan Heinrich , and StefanWermter. 2019. Semantic object accuracy for generative Text-to-Image synthesis. arXiv preprint arXiv:1910.13321 ( 2019 ). Tobias Hinz, Stefan Heinrich, and StefanWermter. 2019. Semantic object accuracy for generative Text-to-Image synthesis. arXiv preprint arXiv:1910.13321 (2019)."},{"key":"e_1_3_2_1_17_1","volume-title":"Large-scale representation learning from visually grounded untranscribed speech. arXiv preprint arXiv:1909.08782","author":"Ilharco Gabriel","year":"2019","unstructured":"Gabriel Ilharco , Yuan Zhang , and Jason Baldridge . 2019. Large-scale representation learning from visually grounded untranscribed speech. arXiv preprint arXiv:1909.08782 ( 2019 ). Gabriel Ilharco, Yuan Zhang, and Jason Baldridge. 2019. Large-scale representation learning from visually grounded untranscribed speech. arXiv preprint arXiv:1909.08782 (2019)."},{"key":"e_1_3_2_1_18_1","first-page":"21357","article-title":"Contragan: Contrastive learning for conditional image generation","volume":"33","author":"Kang Minguk","year":"2020","unstructured":"Minguk Kang and Jaesik Park . 2020 . Contragan: Contrastive learning for conditional image generation . Advances in Neural Information Processing Systems 33 (2020), 21357 -- 21369 . Minguk Kang and Jaesik Park. 2020. Contragan: Contrastive learning for conditional image generation. Advances in Neural Information Processing Systems 33 (2020), 21357--21369.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_19_1","volume-title":"Deepstory: Video story qa by deep embedded memory networks. arXiv preprint arXiv:1707.00836","author":"Kim Kyung-Min","year":"2017","unstructured":"Kyung-Min Kim , Min-Oh Heo , Seong-Ho Choi , and Byoung-Tak Zhang . 2017 . Deepstory: Video story qa by deep embedded memory networks. arXiv preprint arXiv:1707.00836 (2017). Kyung-Min Kim, Min-Oh Heo, Seong-Ho Choi, and Byoung-Tak Zhang. 2017. Deepstory: Video story qa by deep embedded memory networks. arXiv preprint arXiv:1707.00836 (2017)."},{"key":"e_1_3_2_1_20_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba . 2014 . Adam : A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_21_1","volume-title":"Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114","author":"Kingma Diederik P","year":"2013","unstructured":"Diederik P Kingma and Max Welling . 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 ( 2013 ). Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00399"},{"key":"e_1_3_2_1_23_1","unstructured":"Bowen Li Xiaojuan Qi Thomas Lukasiewicz and Philip Torr. 2019. Controllable text-to-image generation. In Advances in Neural Information Processing Systems. 2063--2073.  Bowen Li Xiaojuan Qi Thomas Lukasiewicz and Philip Torr. 2019. Controllable text-to-image generation. In Advances in Neural Information Processing Systems. 2063--2073."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00790"},{"key":"e_1_3_2_1_25_1","volume-title":"Image-to- Image Translation with Text Guidance. arXiv preprint arXiv:2002.05235","author":"Li Bowen","year":"2020","unstructured":"Bowen Li , Xiaojuan Qi , Philip Torr , and Thomas Lukasiewicz . 2020. Image-to- Image Translation with Text Guidance. arXiv preprint arXiv:2002.05235 ( 2020 ). Bowen Li, Xiaojuan Qi, Philip Torr, and Thomas Lukasiewicz. 2020. Image-to- Image Translation with Text Guidance. arXiv preprint arXiv:2002.05235 (2020)."},{"key":"e_1_3_2_1_26_1","first-page":"22020","article-title":"Lightweight generative adversarial networks for text-guided image manipulation","volume":"33","author":"Li Bowen","year":"2020","unstructured":"Bowen Li , Xiaojuan Qi , Philip Torr , and Thomas Lukasiewicz . 2020 . Lightweight generative adversarial networks for text-guided image manipulation . Advances in Neural Information Processing Systems 33 (2020), 22020 -- 22031 . Bowen Li, Xiaojuan Qi, Philip Torr, and Thomas Lukasiewicz. 2020. Lightweight generative adversarial networks for text-guided image manipulation. Advances in Neural Information Processing Systems 33 (2020), 22020--22031.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_27_1","unstructured":"Bowen Li Philip Torr and Thomas Lukasiewicz. 2021. Memory-Driven Text-to- Image Generation. (2021).  Bowen Li Philip Torr and Thomas Lukasiewicz. 2021. Memory-Driven Text-to- Image Generation. (2021)."},{"key":"e_1_3_2_1_28_1","volume-title":"Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966","author":"Li Junnan","year":"2020","unstructured":"Junnan Li , Pan Zhou , Caiming Xiong , and Steven CH Hoi . 2020. Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 ( 2020 ). Junnan Li, Pan Zhou, Caiming Xiong, and Steven CH Hoi. 2020. Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00649"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12233"},{"key":"e_1_3_2_1_31_1","volume-title":"Linguistic and Commonsense Structure into Story Visualization. arXiv preprint arXiv:2110.10834","author":"Maharana Adyasha","year":"2021","unstructured":"Adyasha Maharana and Mohit Bansal . 2021. Integrating Visuospatial , Linguistic and Commonsense Structure into Story Visualization. arXiv preprint arXiv:2110.10834 ( 2021 ). Adyasha Maharana and Mohit Bansal. 2021. Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization. arXiv preprint arXiv:2110.10834 (2021)."},{"key":"e_1_3_2_1_32_1","volume-title":"Improving Generation and Evaluation of Visual Stories via Semantic Consistency. arXiv preprint arXiv:2105.10026","author":"Maharana Adyasha","year":"2021","unstructured":"Adyasha Maharana , Darryl Hannan , and Mohit Bansal . 2021. Improving Generation and Evaluation of Visual Stories via Semantic Consistency. arXiv preprint arXiv:2105.10026 ( 2021 ). Adyasha Maharana, Darryl Hannan, and Mohit Bansal. 2021. Improving Generation and Evaluation of Visual Stories via Semantic Consistency. arXiv preprint arXiv:2105.10026 (2021)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA51294.2020.00014"},{"key":"e_1_3_2_1_34_1","unstructured":"Seonghyeon Nam Yunji Kim and Seon Joo Kim. 2018. Text-adaptive generative adversarial networks: Manipulating images with natural language. In Advances in Neural Information Processing Systems. 42--51.  Seonghyeon Nam Yunji Kim and Seon Joo Kim. 2018. Text-adaptive generative adversarial networks: Manipulating images with natural language. In Advances in Neural Information Processing Systems. 42--51."},{"key":"e_1_3_2_1_35_1","volume-title":"Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741","author":"Nichol Alex","year":"2021","unstructured":"Alex Nichol , Prafulla Dhariwal , Aditya Ramesh , Pranav Shyam , Pamela Mishkin , Bob McGrew , Ilya Sutskever , and Mark Chen . 2021 . Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021). Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021)."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3127905"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58545-7_19"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00160"},{"key":"e_1_3_2_1_39_1","volume-title":"International Conference on Machine Learning. PMLR, 8821--8831","author":"Ramesh Aditya","year":"2021","unstructured":"Aditya Ramesh , Mikhail Pavlov , Gabriel Goh , Scott Gray , Chelsea Voss , Alec Radford , Mark Chen , and Ilya Sutskever . 2021 . Zero-shot text-to-image generation . In International Conference on Machine Learning. PMLR, 8821--8831 . Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821--8831."},{"key":"e_1_3_2_1_40_1","volume-title":"Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396","author":"Reed Scott","year":"2016","unstructured":"Scott Reed , Zeynep Akata , Xinchen Yan , Lajanugen Logeswaran , Bernt Schiele , and Honglak Lee . 2016. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 ( 2016 ). Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)."},{"key":"e_1_3_2_1_41_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and AndrewZisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and AndrewZisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_42_1","volume-title":"Character-Preserving Coherent Story Visualization. In European Conference on Computer Vision. Springer, 18--33","author":"Song Yun-Zhu","year":"2020","unstructured":"Yun-Zhu Song , Zhi Rui Tam , Hung-Jen Chen , Huiao-Han Lu , and Hong-Han Shuai . 2020 . Character-Preserving Coherent Story Visualization. In European Conference on Computer Vision. Springer, 18--33 . Yun-Zhu Song, Zhi Rui Tam, Hung-Jen Chen, Huiao-Han Lu, and Hong-Han Shuai. 2020. Character-Preserving Coherent Story Visualization. In European Conference on Computer Vision. Springer, 18--33."},{"key":"e_1_3_2_1_43_1","volume-title":"DF-GAN: Deep fusion generative adversarial networks for text-to-image synthesis. arXiv preprint arXiv:2008.05865","author":"Tao Ming","year":"2020","unstructured":"Ming Tao , Hao Tang , Songsong Wu , Nicu Sebe , Xiao-Yuan Jing , Fei Wu , and Bingkun Bao . 2020. DF-GAN: Deep fusion generative adversarial networks for text-to-image synthesis. arXiv preprint arXiv:2008.05865 ( 2020 ). Ming Tao, Hao Tang, Songsong Wu, Nicu Sebe, Xiao-Yuan Jing, Fei Wu, and Bingkun Bao. 2020. DF-GAN: Deep fusion generative adversarial networks for text-to-image synthesis. arXiv preprint arXiv:2008.05865 (2020)."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00675"},{"key":"e_1_3_2_1_45_1","unstructured":"Aaron Van den Oord Nal Kalchbrenner Lasse Espeholt Oriol Vinyals Alex Graves etal 2016. Conditional image generation with pixelcnn decoders. In Advances in neural information processing systems. 4790--4798.  Aaron Van den Oord Nal Kalchbrenner Lasse Espeholt Oriol Vinyals Alex Graves et al. 2016. Conditional image generation with pixelcnn decoders. In Advances in neural information processing systems. 4790--4798."},{"key":"e_1_3_2_1_46_1","volume-title":"Representation learning with contrastive predictive coding. arXiv e-prints","author":"den Oord Aaron Van","year":"2018","unstructured":"Aaron Van den Oord , Yazhe Li , and Oriol Vinyals . 2018. Representation learning with contrastive predictive coding. arXiv e-prints ( 2018 ), arXiv--1807. Aaron Van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv e-prints (2018), arXiv--1807."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58607-2_16"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00143"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00654"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00089"},{"volume-title":"Proceedings of the IEEE International Conference on Computer Vision. 5907--5915","author":"Zhang Han","key":"e_1_3_2_1_51_1","unstructured":"Han Zhang , Tao Xu , Hongsheng Li , Shaoting Zhang , Xiaogang Wang , Xiaolei Huang , and Dimitris N. Metaxas . 2017. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks . In Proceedings of the IEEE International Conference on Computer Vision. 5907--5915 . Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N. Metaxas. 2017. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 5907--5915."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2856256"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00595"},{"key":"e_1_3_2_1_54_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3009--3016","author":"Lawrence Zitnick C","year":"2013","unstructured":"C Lawrence Zitnick and Devi Parikh . 2013 . Bringing semantics into focus using visual abstraction . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3009--3016 . C Lawrence Zitnick and Devi Parikh. 2013. Bringing semantics into focus using visual abstraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3009--3016."},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.211"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Lisboa Portugal","acronym":"MM '22"},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548034","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3548034","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:29Z","timestamp":1750186949000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548034"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":55,"alternative-id":["10.1145\/3503161.3548034","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3548034","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}