{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:10:37Z","timestamp":1759133437368,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":86,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,17]]},"DOI":"10.1145\/3474085.3475343","type":"proceedings-article","created":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T20:00:05Z","timestamp":1634587205000},"page":"1893-1902","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Text as Neural Operator:Image Manipulation by Text Instruction"],"prefix":"10.1145","author":[{"given":"Tianhao","family":"Zhang","sequence":"first","affiliation":[{"name":"Google Research, Mountain View, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hung-Yu","family":"Tseng","sequence":"additional","affiliation":[{"name":"University of California, Merced, Merced, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lu","family":"Jiang","sequence":"additional","affiliation":[{"name":"Google Research, Mountain View, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weilong","family":"Yang","sequence":"additional","affiliation":[{"name":"Waymo, Mountain View, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Honglak","family":"Lee","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, MI, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Irfan","family":"Essa","sequence":"additional","affiliation":[{"name":"Google Research, Mountain View, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"International Conference on Machine Learning.","author":"Arjovsky Martin","year":"2017","unstructured":"Martin Arjovsky , Soumith Chintala , and L\u00e9on Bottou . 2017 . Wasserstein gan . In International Conference on Machine Learning. Martin Arjovsky, Soumith Chintala, and L\u00e9on Bottou. 2017. Wasserstein gan. In International Conference on Machine Learning."},{"key":"e_1_3_2_2_2_1","volume-title":"International Conference on Learning Representations.","author":"Brock Andrew","year":"2019","unstructured":"Andrew Brock , Jeff Donahue , and Karen Simonyan . 2019 . Large scale gan training for high fidelity natural image synthesis . In International Conference on Learning Representations. Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00012"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350571"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_11"},{"key":"e_1_3_2_2_6_1","volume-title":"Learning Joint Visual Semantic Matching Embeddings for Language-guided Retrieval. In European Conference on Computer Vision.","author":"Chen Yanbei","year":"2020","unstructured":"Yanbei Chen and Loris Bazzani . 2020 . Learning Joint Visual Semantic Matching Embeddings for Language-guided Retrieval. In European Conference on Computer Vision. Yanbei Chen and Loris Bazzani. 2020. Learning Joint Visual Semantic Matching Embeddings for Language-guided Retrieval. In European Conference on Computer Vision."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00307"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00307"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413551"},{"key":"e_1_3_2_2_10_1","volume-title":"The Cityscapes Dataset for Semantic Urban Scene Understanding. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Cordts Marius","year":"2016","unstructured":"Marius Cordts , Mohamed Omran , Sebastian Ramos , Timo Rehfeld , Markus Enzweiler , Rodrigo Benenson , Uwe Franke , Stefan Roth , and Bernt Schiele . 2016 . The Cityscapes Dataset for Semantic Urban Scene Understanding. In IEEE Conference on Computer Vision and Pattern Recognition. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_11_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"volume-title":"IEEE International Conference on Computer Vision.","author":"El-Nouby Alaaeldin","key":"e_1_3_2_2_12_1","unstructured":"Alaaeldin El-Nouby , Shikhar Sharma , Hannes Schulz , Devon Hjelm , Layla El Asri , Samira Ebrahimi Kahou , Yoshua Bengio , and Graham W. Taylor . 2019. Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction . In IEEE International Conference on Computer Vision. Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, and Graham W. Taylor. 2019. Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction. In IEEE International Conference on Computer Vision."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2975961"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00126"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969033.2969125"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3326943.3327006"},{"key":"e_1_3_2_2_17_1","volume-title":"Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. arXiv preprint arXiv:1905.12794","author":"Guo Xiaoxiao","year":"2019","unstructured":"Xiaoxiao Guo , Hui Wu , Yupeng Gao , Steven Rennie , and Rogerio Feris . 2019 . Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. arXiv preprint arXiv:1905.12794 (2019). Xiaoxiao Guo, Hui Wu, Yupeng Gao, Steven Rennie, and Rogerio Feris. 2019. Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. arXiv preprint arXiv:1905.12794 (2019)."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295408"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327144.3327195"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.167"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01219-9_11"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_5"},{"key":"e_1_3_2_2_24_1","volume-title":"Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Isola Phillip","year":"2017","unstructured":"Phillip Isola , Jun-Yan Zhu , Tinghui Zhou , and Alexei A Efros . 2017 . Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_25_1","volume-title":"International Conference on Learning Representations.","author":"Jang Eric","year":"2017","unstructured":"Eric Jang , Shixiang Gu , and Ben Poole . 2017 . Categorical reparameterization with gumbel-softmax . In International Conference on Learning Representations. Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00133"},{"volume-title":"IEEE Conference on Computer Vision and Pattern Recognition.","author":"Johnson Justin","key":"e_1_3_2_2_27_1","unstructured":"Justin Johnson , Bharath Hariharan , Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017a. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning . In IEEE Conference on Computer Vision and Pattern Recognition. Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C Lawrence Zitnick, and Ross Girshick. 2017a. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.325"},{"key":"e_1_3_2_2_29_1","volume-title":"CoDraw: Collaborative drawing as a testbed for grounded goal-driven communication. arXiv preprint arXiv:1712.05558","author":"Kim Jin-Hwa","year":"2017","unstructured":"Jin-Hwa Kim , Nikita Kitaev , Xinlei Chen , Marcus Rohrbach , Byoung-Tak Zhang , Yuandong Tian , Dhruv Batra , and Devi Parikh . 2017. CoDraw: Collaborative drawing as a testbed for grounded goal-driven communication. arXiv preprint arXiv:1712.05558 ( 2017 ). Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, and Devi Parikh. 2017. CoDraw: Collaborative drawing as a testbed for grounded goal-driven communication. arXiv preprint arXiv:1712.05558 (2017)."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157137"},{"key":"e_1_3_2_2_31_1","volume-title":"International Conference on Learning Representations.","author":"Kingma Diederik P","year":"2015","unstructured":"Diederik P Kingma and Jimmy Ba . 2015 . Adam: A method for stochastic optimization . In International Conference on Learning Representations. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354723"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2470654.2481301"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01284-z"},{"key":"e_1_3_2_2_35_1","unstructured":"Hsin-Ying Lee Xiaodong Yang Ming-Yu Liu Ting-Chun Wang Yu-Ding Lu Ming-Hsuan Yang and Jan Kautz. 2019. Dancing to Music. In Neural Information Processing Systems.  Hsin-Ying Lee Xiaodong Yang Ming-Yu Liu Ting-Chun Wang Yu-Ding Lu Ming-Hsuan Yang and Jan Kautz. 2019. Dancing to Music. In Neural Information Processing Systems."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3454472"},{"key":"e_1_3_2_2_37_1","volume-title":"ManiGAN: Text-Guided Image Manipulation. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Li Bowen","year":"2020","unstructured":"Bowen Li , Xiaojuan Qi , Thomas Lukasiewicz , and Philip HS Torr . 2020 b . ManiGAN: Text-Guided Image Manipulation. In IEEE Conference on Computer Vision and Pattern Recognition. Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, and Philip HS Torr. 2020 b. ManiGAN: Text-Guided Image Manipulation. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00432"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01245"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413684"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00649"},{"key":"e_1_3_2_2_42_1","volume-title":"Controllable and Progressive Image Extrapolation. In IEEE\/CVF Winter Conference on Applications of Computer Vision.","author":"Li Yijun","year":"2021","unstructured":"Yijun Li , Lu Jiang , and Ming-Hsuan Yang . 2021 . Controllable and Progressive Image Extrapolation. In IEEE\/CVF Winter Conference on Applications of Computer Vision. Yijun Li, Lu Jiang, and Ming-Hsuan Yang. 2021. Controllable and Progressive Image Extrapolation. In IEEE\/CVF Winter Conference on Applications of Computer Vision."},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01219-9_28"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2890628"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413982"},{"key":"e_1_3_2_2_46_1","volume-title":"Attention-Based Spatial Guidance for Image-to-Image Translation. In IEEE\/CVF Winter Conference on Applications of Computer Vision. 816--825","author":"Lin Yu","year":"2021","unstructured":"Yu Lin , Yigong Wang , Yifan Li , Yang Gao , Zhuoyi Wang , and Latifur Khan . 2021 . Attention-Based Spatial Guidance for Image-to-Image Translation. In IEEE\/CVF Winter Conference on Applications of Computer Vision. 816--825 . Yu Lin, Yigong Wang, Yifan Li, Yang Gao, Zhuoyi Wang, and Latifur Khan. 2021. Attention-Based Spatial Guidance for Image-to-Image Translation. In IEEE\/CVF Winter Conference on Applications of Computer Vision. 816--825."},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413505"},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.5555\/3294771.3294810"},{"key":"e_1_3_2_2_49_1","volume-title":"Program-Guided Image Manipulators. In IEEE International Conference on Computer Vision.","author":"Mao Jiayuan","year":"2019","unstructured":"Jiayuan Mao , Xiuming Zhang , Yikai Li , William T Freeman , Joshua B Tenenbaum , and Jiajun Wu . 2019 . Program-Guided Image Manipulators. In IEEE International Conference on Computer Vision. Jiayuan Mao, Xiuming Zhang, Yikai Li, William T Freeman, Joshua B Tenenbaum, and Jiajun Wu. 2019. Program-Guided Image Manipulators. In IEEE International Conference on Computer Vision."},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.304"},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327144.3327286"},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.5555\/3326943.3326948"},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.11"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00244"},{"key":"e_1_3_2_2_55_1","volume-title":"NeurIPS workshop.","author":"Paszke Adam","year":"2017","unstructured":"Adam Paszke , Sam Gross , Soumith Chintala , Gregory Chanan , Edward Yang , Zachary DeVito , Zeming Lin , Alban Desmaison , Luca Antiga , and Adam Lerer . 2017 . Automatic differentiation in pytorch . In NeurIPS workshop. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. In NeurIPS workshop."},{"key":"e_1_3_2_2_56_1","volume-title":"AAAI Conference on Artificial Intelligence.","author":"Perez Ethan","year":"2018","unstructured":"Ethan Perez , Florian Strub , Harm De Vries , Vincent Dumoulin , and Aaron Courville . 2018 . Film: Visual reasoning with a general conditioning layer . In AAAI Conference on Artificial Intelligence. Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. 2018. Film: Visual reasoning with a general conditioning layer. In AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201393"},{"key":"e_1_3_2_2_58_1","volume-title":"International Conference on Learning Representations.","author":"Rosenbaum Clemens","year":"2018","unstructured":"Clemens Rosenbaum , Tim Klinger , and Matthew Riemer . 2018 . Routing networks: Adaptive selection of non-linear functions for multi-task learning . In International Conference on Learning Representations. Clemens Rosenbaum, Tim Klinger, and Matthew Riemer. 2018. Routing networks: Adaptive selection of non-linear functions for multi-task learning. In International Conference on Learning Representations."},{"key":"e_1_3_2_2_59_1","unstructured":"Cella Lao Rousseau. 2017. Bye-bye basic editing hello voice-controlled Photoshop! https:\/\/www.imore.com\/bye-bye-basic-editing-hello-voice-controlled-photoshop  Cella Lao Rousseau. 2017. Bye-bye basic editing hello voice-controlled Photoshop! https:\/\/www.imore.com\/bye-bye-basic-editing-hello-voice-controlled-photoshop"},{"key":"e_1_3_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295250"},{"key":"e_1_3_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/800049.801812"},{"key":"e_1_3_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2997012"},{"key":"e_1_3_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58523-5_10"},{"key":"e_1_3_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00783"},{"key":"e_1_3_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58598-3_15"},{"key":"e_1_3_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.437"},{"key":"e_1_3_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00660"},{"key":"e_1_3_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00917"},{"key":"e_1_3_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350944"},{"key":"e_1_3_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00143"},{"key":"e_1_3_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3454642"},{"key":"e_1_3_2_2_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240559"},{"key":"e_1_3_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2856256"},{"key":"e_1_3_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240594"},{"key":"e_1_3_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3414017"},{"key":"e_1_3_2_2_77_1","volume-title":"Colorful Image Colorization. In European Conference on Computer Vision.","author":"Zhang Richard","year":"2016","unstructured":"Richard Zhang , Phillip Isola , and Alexei A Efros . 2016 . Colorful Image Colorization. In European Conference on Computer Vision. Richard Zhang, Phillip Isola, and Alexei A Efros. 2016. Colorful Image Colorization. In European Conference on Computer Vision."},{"key":"e_1_3_2_2_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073703"},{"key":"e_1_3_2_2_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413939"},{"key":"e_1_3_2_2_80_1","volume-title":"Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search. In IEEE Conference on Computer Vision and Pattern Recognition.","author":"Zhao Bo","year":"2017","unstructured":"Bo Zhao , Jiashi Feng , Xiao Wu , and Shuicheng Yan . 2017 . Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search. In IEEE Conference on Computer Vision and Pattern Recognition. Bo Zhao, Jiashi Feng, Xiao Wu, and Shuicheng Yan. 2017. Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search. In IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_2_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350946"},{"key":"e_1_3_2_2_82_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.244"},{"key":"e_1_3_2_2_83_1","doi-asserted-by":"publisher","DOI":"10.5555\/3294771.3294816"},{"key":"e_1_3_2_2_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00595"},{"key":"e_1_3_2_2_85_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.387"},{"key":"e_1_3_2_2_86_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356561"}],"event":{"name":"MM '21: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Virtual Event China","acronym":"MM '21"},"container-title":["Proceedings of the 29th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475343","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474085.3475343","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:18Z","timestamp":1750193358000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475343"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":86,"alternative-id":["10.1145\/3474085.3475343","10.1145\/3474085"],"URL":"https:\/\/doi.org\/10.1145\/3474085.3475343","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}