{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,9]],"date-time":"2025-11-09T03:46:52Z","timestamp":1762660012711,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T00:00:00Z","timestamp":1602460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61771457","61732007"],"award-info":[{"award-number":["61771457","61732007"]}]},{"name":"National Key R&D Program of China","award":["2018AAA0102003"],"award-info":[{"award-number":["2018AAA0102003"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3394171.3413777","type":"proceedings-article","created":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T13:10:18Z","timestamp":1602508218000},"page":"322-330","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning"],"prefix":"10.1145","author":[{"given":"Zhenhuan","family":"Liu","sequence":"first","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jincan","family":"Deng","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liang","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shaofei","family":"Cai","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qianqian","family":"Xu","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuhui","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qingming","family":"Huang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Jamie Ryan Kiros, and Geoffrey E Hinton","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba , Jamie Ryan Kiros, and Geoffrey E Hinton . 2016 . Layer normalization. arXiv preprint arXiv:1607.06450 (2016). Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016)."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.608"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.01040"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018312"},{"key":"e_1_3_2_2_5_1","unstructured":"Martin Heusel Hubert Ramsauer Thomas Unterthiner Bernhard Nessler and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems. 6626--6637.  Martin Heusel Hubert Ramsauer Thomas Unterthiner Bernhard Nessler and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems. 6626--6637."},{"key":"e_1_3_2_2_6_1","volume-title":"International Conference on Machine Learning. 448--456","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015 . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . In International Conference on Machine Learning. 448--456 . Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning. 448--456."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.632"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2901--2910","author":"Johnson Justin","key":"e_1_3_2_2_8_1","unstructured":"Justin Johnson , Bharath Hariharan , Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2901--2910 . Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2901--2910."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1651"},{"key":"e_1_3_2_2_10_1","volume-title":"Adam: A Method for Stochastic Optimization. international conference on learning representations","author":"Diederik Kingma P.","year":"2015","unstructured":"P. Diederik Kingma and Jimmy Ba . 2015 . Adam: A Method for Stochastic Optimization. international conference on learning representations (2015). P. Diederik Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. international conference on learning representations (2015)."},{"key":"e_1_3_2_2_11_1","volume-title":"Drit: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision","author":"Lee Hsin-Ying","year":"2020","unstructured":"Hsin-Ying Lee , Hung-Yu Tseng , Qi Mao , Jia-Bin Huang , Yu-Ding Lu , Maneesh Singh , and Ming-Hsuan Yang . 2020 . Drit: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision (2020), 1--16. Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, and Ming-Hsuan Yang. 2020. Drit: Diverse image-to-image translation via disentangled representations. International Journal of Computer Vision (2020), 1--16."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2012.2194993"},{"key":"e_1_3_2_2_13_1","volume-title":"Multimodal Structure-Consistent Image-to-Image Translation. AAAI Conference on Artificial Intelligence","author":"Lin Che-Tsung","year":"2020","unstructured":"Che-Tsung Lin , Yen-Yi Wu , Po-Hao Hsu , and Shang-Hong Lai . 2020 . Multimodal Structure-Consistent Image-to-Image Translation. AAAI Conference on Artificial Intelligence (2020). Che-Tsung Lin, Yen-Yi Wu, Po-Hao Hsu, and Shang-Hong Lai. 2020. Multimodal Structure-Consistent Image-to-Image Translation. AAAI Conference on Artificial Intelligence (2020)."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00270"},{"key":"e_1_3_2_2_15_1","volume-title":"CONFIG: Controllable Neural Face Image Generation. IEEE Conference on Computer Vision and Pattern Recognition","author":"Marek Kowalski","year":"2020","unstructured":"Kowalski Marek , Stephan Garbin J., Estellers Virginia , Baltruaitis Tadas , Johnson Matthew , and Shotton Jamie . 2020 . CONFIG: Controllable Neural Face Image Generation. IEEE Conference on Computer Vision and Pattern Recognition (2020). Kowalski Marek, Stephan Garbin J., Estellers Virginia, Baltruaitis Tadas, Johnson Matthew, and Shotton Jamie. 2020. CONFIG: Controllable Neural Face Image Generation. IEEE Conference on Computer Vision and Pattern Recognition (2020)."},{"key":"e_1_3_2_2_16_1","volume-title":"Conditional Generative Adversarial Nets. CoRR","author":"Mirza Mehdi","year":"2014","unstructured":"Mehdi Mirza and Simon Osindero . 2014. Conditional Generative Adversarial Nets. CoRR ( 2014 ). Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. CoRR (2014)."},{"key":"e_1_3_2_2_17_1","volume-title":"Spectral Normalization for Generative Adversarial Networks. International Conference on Learning Representations","author":"Miyato Takeru","year":"2018","unstructured":"Takeru Miyato , Toshiki Kataoka , Masanori Koyama , and Yuichi Yoshida . 2018 . Spectral Normalization for Generative Adversarial Networks. International Conference on Learning Representations (2018). Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. International Conference on Learning Representations (2018)."},{"key":"e_1_3_2_2_18_1","volume-title":"International Conference on Learning Representations","author":"Miyato Takeru","year":"2018","unstructured":"Takeru Miyato and Masanori Koyama . 2018 . cGANs with Projection Discriminator . International Conference on Learning Representations (2018). Takeru Miyato and Masanori Koyama. 2018. cGANs with Projection Discriminator. International Conference on Learning Representations (2018)."},{"key":"e_1_3_2_2_19_1","unstructured":"Seonghyeon Nam Yunji Kim and Seon Joo Kim. 2018. Text-adaptive generative adversarial networks: manipulating images with natural language. In Advances in neural information processing systems. 42--51.  Seonghyeon Nam Yunji Kim and Seon Joo Kim. 2018. Text-adaptive generative adversarial networks: manipulating images with natural language. In Advances in neural information processing systems. 42--51."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305890.3305954"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00244"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00160"},{"key":"e_1_3_2_2_24_1","volume-title":"33rd International Conference on Machine Learning. 1060--1069","author":"Reed Scott","year":"2016","unstructured":"Scott Reed , Zeynep Akata , Xinchen Yan , Lajanugen Logeswaran , Bernt Schiele , and Honglak Lee . 2016 . Generative Adversarial Text to Image Synthesis . In 33rd International Conference on Machine Learning. 1060--1069 . Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative Adversarial Text to Image Synthesis. In 33rd International Conference on Machine Learning. 1060--1069."},{"key":"e_1_3_2_2_25_1","unstructured":"Tim Salimans Ian Goodfellow Wojciech Zaremba Vicki Cheung Alec Radford and Xi Chen. 2016. Improved techniques for training gans. In Advances in neural information processing systems. 2234--2242.  Tim Salimans Ian Goodfellow Wojciech Zaremba Vicki Cheung Alec Radford and Xi Chen. 2016. Improved techniques for training gans. In Advances in neural information processing systems. 2234--2242."},{"key":"e_1_3_2_2_26_1","volume-title":"Samira Ebrahimi Kahou, and Yoshua Bengio","author":"Sharma Shikhar","year":"2018","unstructured":"Shikhar Sharma , Dendi Suhubdy , Vincent Michalski , Samira Ebrahimi Kahou, and Yoshua Bengio . 2018 . Chatpainter : Improving text to image generation using dialogue. arXiv preprint arXiv:1802.08216 (2018). Shikhar Sharma, Dendi Suhubdy, Vincent Michalski, Samira Ebrahimi Kahou, and Yoshua Bengio. 2018. Chatpainter: Improving text to image generation using dialogue. arXiv preprint arXiv:1802.08216 (2018)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.01060"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00143"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_47"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.746"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2912735"},{"key":"e_1_3_2_2_32_1","volume-title":"Attentive Normalization for Conditional Image Generation. IEEE Conference on Computer Vision and Pattern Recognition","author":"Yi Wang","year":"2020","unstructured":"Wang Yi , Chen Ying-Cong , Zhang Xiangyu , Sun Jian , and Jia Jiaya . 2020 . Attentive Normalization for Conditional Image Generation. IEEE Conference on Computer Vision and Pattern Recognition (2020). Wang Yi, Chen Ying-Cong, Zhang Xiangyu, Sun Jian, and Jia Jiaya. 2020. Attentive Normalization for Conditional Image Generation. IEEE Conference on Computer Vision and Pattern Recognition (2020)."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00878"},{"key":"e_1_3_2_2_34_1","volume-title":"Stackgan: Realistic image synthesis with stacked generative adversarial networks","author":"Zhang Han","year":"2018","unstructured":"Han Zhang , Tao Xu , Hongsheng Li , Shaoting Zhang , Xiaogang Wang , Xiaolei Huang , and Dimitris N Metaxas . 2018 . Stackgan: Realistic image synthesis with stacked generative adversarial networks . IEEE transactions on pattern analysis and machine intelligence, Vol. 41 , 8 (2018), 1947--1962. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. 2018. Stackgan: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 8 (2018), 1947--1962."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.244"}],"event":{"name":"MM '20: The 28th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Seattle WA USA","acronym":"MM '20"},"container-title":["Proceedings of the 28th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413777","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394171.3413777","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:17Z","timestamp":1750197677000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413777"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":35,"alternative-id":["10.1145\/3394171.3413777","10.1145\/3394171"],"URL":"https:\/\/doi.org\/10.1145\/3394171.3413777","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}