{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:57:34Z","timestamp":1760245054161,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":29,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,6,8]],"date-time":"2020-06-08T00:00:00Z","timestamp":1591574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"MoE-CMCC Artificial Intelligence Project","award":["MCM20190701"],"award-info":[{"award-number":["MCM20190701"]}]},{"DOI":"10.13039\/501100012659","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61906018"],"award-info":[{"award-number":["61906018"]}],"id":[{"id":"10.13039\/501100012659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,6,8]]},"DOI":"10.1145\/3372278.3390684","type":"proceedings-article","created":{"date-parts":[[2020,6,2]],"date-time":"2020-06-02T04:35:27Z","timestamp":1591072527000},"page":"145-153","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Image Synthesis from Locally Related Texts"],"prefix":"10.1145","author":[{"given":"Tianrui","family":"Niu","sequence":"first","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]},{"given":"Fangxiang","family":"Feng","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]},{"given":"Lingxuan","family":"Li","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]},{"given":"Xiaojie","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing University of Posts and Telecommunications, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2020,6,8]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"VQA: Visual Question Answering. In International Conference on Computer Vision (ICCV).","author":"Antol Stanislaw","year":"2015","unstructured":"Stanislaw Antol , Aishwarya Agrawal , Jiasen Lu , Margaret Mitchell , Dhruv Batra , C. Lawrence Zitnick , and Devi Parikh . 2015 . VQA: Visual Question Answering. In International Conference on Computer Vision (ICCV). Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In International Conference on Computer Vision (ICCV)."},{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the 2017 International Conference on Computer Vision.","author":"Hedi","year":"2017","unstructured":"Hedi Ben-younes, R\u00e9mi Cadene , Matthieu Cord , and Nicolas Thome . 2017 . MUTAN: Multimodal Tucker Fusion for Visual Question Answering . In Proceedings of the 2017 International Conference on Computer Vision. Hedi Ben-younes, R\u00e9mi Cadene, Matthieu Cord, and Nicolas Thome. 2017. MUTAN: Multimodal Tucker Fusion for Visual Question Answering. In Proceedings of the 2017 International Conference on Computer Vision."},{"volume-title":"AAAI Conference on Artificial Intelligence (AAAI).","author":"Cha Miriam","key":"e_1_3_2_1_3_1","unstructured":"Miriam Cha , Youngjune L. Gown , and H. T. Kung . 2019. Adversarial Learning of Semantic Relevance in Text to Image Synthesis . In AAAI Conference on Artificial Intelligence (AAAI). Miriam Cha, Youngjune L. Gown, and H. T. Kung. 2019. Adversarial Learning of Semantic Relevance in Text to Image Synthesis. In AAAI Conference on Artificial Intelligence (AAAI)."},{"key":"e_1_3_2_1_4_1","volume-title":"Sequential Attention GAN for Interactive Image Editing via Dialogue. CoRR","author":"Cheng Yu","year":"2018","unstructured":"Yu Cheng , Zhe Gan , Yitong Li , Jingjing Liu , and Jianfeng Gao . 2018. Sequential Attention GAN for Interactive Image Editing via Dialogue. CoRR , Vol. abs\/ 1812 .08352 ( 2018 ). arxiv: 1812.08352 http:\/\/arxiv.org\/abs\/1812.08352 Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, and Jianfeng Gao. 2018. Sequential Attention GAN for Interactive Image Editing via Dialogue. CoRR, Vol. abs\/1812.08352 (2018). arxiv: 1812.08352 http:\/\/arxiv.org\/abs\/1812.08352"},{"volume-title":"The IEEE International Conference on Computer Vision (ICCV).","author":"El-Nouby Alaaeldin","key":"e_1_3_2_1_5_1","unstructured":"Alaaeldin El-Nouby , Shikhar Sharma , Hannes Schulz , Devon Hjelm , Layla El Asri , Samira Ebrahimi Kahou , Yoshua Bengio , and Graham W. Taylor . 2019. Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction . In The IEEE International Conference on Computer Vision (ICCV). Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, and Graham W. Taylor. 2019. Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction. In The IEEE International Conference on Computer Vision (ICCV)."},{"key":"e_1_3_2_1_6_1","volume-title":"Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis. In AAAI Conference on Artificial Intelligence (AAAI).","author":"Gao Lianli","year":"2019","unstructured":"Lianli Gao , Daiyuan Chen , Jingkuan Song , Xing Xu , Dongxiang Zhang , and Heng Tao Shen . 2019 . Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis. In AAAI Conference on Artificial Intelligence (AAAI). Lianli Gao, Daiyuan Chen, Jingkuan Song, Xing Xu, Dongxiang Zhang, and Heng Tao Shen. 2019. Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis. In AAAI Conference on Artificial Intelligence (AAAI)."},{"key":"e_1_3_2_1_7_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27. 2672--2680.  Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27. 2672--2680."},{"key":"e_1_3_2_1_8_1","volume-title":"Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. In Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Goyal Yash","year":"2017","unstructured":"Yash Goyal , Tejas Khot , Douglas Summers-Stay , Dhruv Batra , and Devi Parikh . 2017 . Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. In Conference on Computer Vision and Pattern Recognition (CVPR). Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. 2017. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. In Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_1_9_1","unstructured":"Martin Heusel Hubert Ramsauer Thomas Unterthiner Bernhard Nessler and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems 30. 6626--6637. http:\/\/papers.nips.cc\/paper\/7240-gans-trained-by-a-two-time-scale-update-rule-converge-to-a-local-nash-equilibrium.pdf  Martin Heusel Hubert Ramsauer Thomas Unterthiner Bernhard Nessler and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems 30. 6626--6637. http:\/\/papers.nips.cc\/paper\/7240-gans-trained-by-a-two-time-scale-update-rule-converge-to-a-local-nash-equilibrium.pdf"},{"key":"e_1_3_2_1_10_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=H1edIiA9KQ","author":"Hinz Tobias","year":"2019","unstructured":"Tobias Hinz , Stefan Heinrich , and Stefan Wermter . 2019 . Generating Multiple Objects at Spatially Distinct Locations . In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=H1edIiA9KQ Tobias Hinz, Stefan Heinrich, and Stefan Wermter. 2019. Generating Multiple Objects at Spatially Distinct Locations. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=H1edIiA9KQ"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Seunghoon Hong Dingdong Yang Jongwook Choi and Honglak Lee. 2018. Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis. In CVPR. 7986--7994.  Seunghoon Hong Dingdong Yang Jongwook Choi and Honglak Lee. 2018. Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis. In CVPR. 7986--7994.","DOI":"10.1109\/CVPR.2018.00833"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Justin Johnson Agrim Gupta and Li Fei-Fei. 2018. Image Generation from Scene Graphs. In CVPR.  Justin Johnson Agrim Gupta and Li Fei-Fei. 2018. Image Generation from Scene Graphs. In CVPR.","DOI":"10.1109\/CVPR.2018.00133"},{"key":"e_1_3_2_1_13_1","volume-title":"CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017","author":"Johnson Justin","year":"2017","unstructured":"Justin Johnson , Bharath Hariharan , Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross B. Girshick. 2017 . CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 , Honolulu, HI, USA, July 21--26 , 2017 . 1988--1997. Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross B. Girshick. 2017. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. 1988--1997."},{"key":"e_1_3_2_1_14_1","unstructured":"Wenbo Li Pengchuan Zhang Lei Zhang Qiuyuan Huang Xiaodong He Siwei Lyu and Jianfeng Gao. 2019. Object-driven Text-to-Image Synthesis via Adversarial Training. In CVPR.  Wenbo Li Pengchuan Zhang Lei Zhang Qiuyuan Huang Xiaodong He Siwei Lyu and Jianfeng Gao. 2019. Object-driven Text-to-Image Synthesis via Adversarial Training. In CVPR."},{"volume-title":"Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV) (2014-01-01)","author":"Lin Tsung-Yi","key":"e_1_3_2_1_15_1","unstructured":"Tsung-Yi Lin , Michael Maire , Serge Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1r , and C. Lawrence Zitnick . 2014 . Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV) (2014-01-01) . Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV) (2014-01-01)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00519"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Tingting Qiao Jing Zhang Duanqing Xu and Dacheng Tao. 2019. MirrorGAN: Learning Text-to-image Generation by Redescription. In CVPR.  Tingting Qiao Jing Zhang Duanqing Xu and Dacheng Tao. 2019. MirrorGAN: Learning Text-to-image Generation by Redescription. In CVPR.","DOI":"10.1109\/CVPR.2019.00160"},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the 33rd International Conference on International Conference on Machine Learning -","volume":"48","author":"Reed Scott","year":"2016","unstructured":"Scott Reed , Zeynep Akata , Xinchen Yan , Lajanugen Logeswaran , Bernt Schiele , and Honglak Lee . 2016 . Generative Adversarial Text to Image Synthesis . In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (New York, NY, USA) (ICML'16). 1060--1069. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative Adversarial Text to Image Synthesis. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (New York, NY, USA) (ICML'16). 1060--1069."},{"key":"e_1_3_2_1_19_1","unstructured":"Tim Salimans Ian Goodfellow Wojciech Zaremba Vicki Cheung Alec Radford Xi Chen and Xi Chen. 2016. Improved Techniques for Training GANs. In Advances in Neural Information Processing Systems 29. 2234--2242.  Tim Salimans Ian Goodfellow Wojciech Zaremba Vicki Cheung Alec Radford Xi Chen and Xi Chen. 2016. Improved Techniques for Training GANs. In Advances in Neural Information Processing Systems 29. 2234--2242."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_3_2_1_21_1","volume-title":"ICLR Workshop.","author":"Sharma Shikhar","year":"2018","unstructured":"Shikhar Sharma , Dendi Suhubdy , Vincent Michalski , Samira Ebrahimi Kahou , and Yoshua Bengio . 2018 . ChatPainter: Improving Text to Image Generation using Dialogue . In ICLR Workshop. Shikhar Sharma, Dendi Suhubdy, Vincent Michalski, Samira Ebrahimi Kahou, and Yoshua Bengio. 2018. ChatPainter: Improving Text to Image Generation using Dialogue. In ICLR Workshop."},{"key":"e_1_3_2_1_22_1","volume-title":"Interactive Image Generation Using Scene Graphs. In ICLR Workshop.","author":"Sharma Shikhar","year":"2019","unstructured":"Shikhar Sharma , Dendi Suhubdy , Vincent Michalski , Samira Ebrahimi Kahou , and Yoshua Bengio . 2019 . Interactive Image Generation Using Scene Graphs. In ICLR Workshop. Shikhar Sharma, Dendi Suhubdy, Vincent Michalski, Samira Ebrahimi Kahou, and Yoshua Bengio. 2019. Interactive Image Generation Using Scene Graphs. In ICLR Workshop."},{"key":"e_1_3_2_1_23_1","unstructured":"Chenfei Wu Jinlai Liu Xiaojie Wang and Xuan Dong. 2018. Chain of Reasoning for Visual Question Answering. In Advances in Neural Information Processing Systems 31. 275--285.  Chenfei Wu Jinlai Liu Xiaojie Wang and Xuan Dong. 2018. Chain of Reasoning for Visual Question Answering. In Advances in Neural Information Processing Systems 31. 275--285."},{"key":"e_1_3_2_1_24_1","volume-title":"Differential Networks for Visual Question Answering. In AAAI Conference on Artificial Intelligence (AAAI).","author":"Wu Chenfei","year":"2019","unstructured":"Chenfei Wu , Jinlai Liu , Xiaojie Wang , and Ruifan Li . 2019 . Differential Networks for Visual Question Answering. In AAAI Conference on Artificial Intelligence (AAAI). Chenfei Wu, Jinlai Liu, Xiaojie Wang, and Ruifan Li. 2019. Differential Networks for Visual Question Answering. In AAAI Conference on Artificial Intelligence (AAAI)."},{"key":"e_1_3_2_1_25_1","volume-title":"Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research)","volume":"37","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhudinov , Rich Zemel , and Yoshua Bengio . 2015 . Show , Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research) , Vol. 37 . Lille, France , 2048--2057. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 37. Lille, France, 2048--2057."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Tao Xu Pengchuan Zhang Qiuyuan Huang Han Zhang Zhe Gan Xiaolei Huang and Xiaodong He. 2018. AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks. In CVPR. 1316--1324.  Tao Xu Pengchuan Zhang Qiuyuan Huang Han Zhang Zhe Gan Xiaolei Huang and Xiaodong He. 2018. AttnGAN: Fine-Grained Text to Image Generation With Attentional Generative Adversarial Networks. In CVPR. 1316--1324.","DOI":"10.1109\/CVPR.2018.00143"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Han Zhang Tao Xu Hongsheng Li Shaoting Zhang Xiaolei Huang Xiaogang Wang and Dimitris Metaxas. 2017. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV. 5908--5916.  Han Zhang Tao Xu Hongsheng Li Shaoting Zhang Xiaolei Huang Xiaogang Wang and Dimitris Metaxas. 2017. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV. 5908--5916.","DOI":"10.1109\/ICCV.2017.629"},{"key":"e_1_3_2_1_28_1","volume-title":"Realistic Image Synthesis with Stacked Generative Adversarial Networks","author":"Zhang Han","year":"2018","unstructured":"Han Zhang , Tao Xu , Hongsheng Li , Shaoting Zhang , Xiaogang Wang , Xiaolei Huang , and Dimitris Metaxas . 2018. StackGAN+ : Realistic Image Synthesis with Stacked Generative Adversarial Networks . IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) ( 2018 ). Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. 2018. StackGAN+: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) (2018)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Bo Zhao Lili Meng Weidong Yin and Leonid Sigal. 2019. Image Generation from Layout. In CVPR.  Bo Zhao Lili Meng Weidong Yin and Leonid Sigal. 2019. Image Generation from Layout. In CVPR.","DOI":"10.1109\/CVPR.2019.00878"}],"event":{"name":"ICMR '20: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"ICMR '20"},"container-title":["Proceedings of the 2020 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390684","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3372278.3390684","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:32:10Z","timestamp":1750195930000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390684"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,8]]},"references-count":29,"alternative-id":["10.1145\/3372278.3390684","10.1145\/3372278"],"URL":"https:\/\/doi.org\/10.1145\/3372278.3390684","relation":{},"subject":[],"published":{"date-parts":[[2020,6,8]]},"assertion":[{"value":"2020-06-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}