{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:16:11Z","timestamp":1750220171655,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3548345","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:43:12Z","timestamp":1665416592000},"page":"482-490","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Spatial-Temporal Aligned Multi-Agent Learning for Visual Dialog Systems"],"prefix":"10.1145","author":[{"given":"Yong","family":"Zhuang","sequence":"first","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"given":"Tong","family":"Yu","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"given":"Junda","family":"Wu","sequence":"additional","affiliation":[{"name":"New York University, New York City, NY, USA"}]},{"given":"Shiqu","family":"Wu","sequence":"additional","affiliation":[{"name":"University of California, San Diego, La Jolla, CA, USA"}]},{"given":"Shuai","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_2_2_1","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 1140--1149","author":"Anwaar Muhammad Umer","year":"2020","unstructured":"Muhammad Umer Anwaar , Egor Labintcev , and Martin Kleinsteuber . 2020 . Compositional Learning of Image-Text Query for Image Retrieval . In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 1140--1149 . Muhammad Umer Anwaar, Egor Labintcev, and Martin Kleinsteuber. 2020. Compositional Learning of Image-Text Query for Image Retrieval. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 1140--1149."},{"key":"e_1_3_2_2_3_1","volume-title":"Science","volume":"359","author":"Brown Noam","year":"2018","unstructured":"Noam Brown and Tuomas Sandholm . 2018 . Superhuman AI for heads-up no-limit poker: Libratus beats top professionals . Science , Vol. 359 , 6374 (2018), 418--424. Noam Brown and Tuomas Sandholm. 2018. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, Vol. 359, 6374 (2018), 418--424."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.121"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.321"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462899"},{"key":"e_1_3_2_2_7_1","unstructured":"Xiaoxiao Guo Hui Wu Yu Cheng Steven Rennie Gerald Tesauro and Rogerio Feris. 2018. Dialog-based interactive image retrieval. In Advances in neural information processing systems. 678--688.  Xiaoxiao Guo Hui Wu Yu Cheng Steven Rennie Gerald Tesauro and Rogerio Feris. 2018. Dialog-based interactive image retrieval. In Advances in neural information processing systems. 678--688."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV51458.2022.00067"},{"key":"e_1_3_2_2_9_1","volume-title":"Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https:\/\/arxiv.org\/abs\/1602.07332","author":"Krishna Ranjay","year":"2016","unstructured":"Ranjay Krishna , Yuke Zhu , Oliver Groth , Justin Johnson , Kenji Hata , Joshua Kravitz , Stephanie Chen , Yannis Kalantidis , Li-Jia Li , David A Shamma , Michael Bernstein , and Li Fei-Fei . 2016 . Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https:\/\/arxiv.org\/abs\/1602.07332 Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. 2016. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. https:\/\/arxiv.org\/abs\/1602.07332"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.685"},{"key":"e_1_3_2_2_11_1","volume-title":"Convlab: Multi-domain end-to-end dialog system platform. arXiv preprint arXiv:1904.08637","author":"Lee Sungjin","year":"2019","unstructured":"Sungjin Lee , Qi Zhu , Ryuichi Takanobu , Xiang Li , Yaoqin Zhang , Zheng Zhang , Jinchao Li , Baolin Peng , Xiujun Li , Minlie Huang , 2019 . Convlab: Multi-domain end-to-end dialog system platform. arXiv preprint arXiv:1904.08637 (2019). Sungjin Lee, Qi Zhu, Ryuichi Takanobu, Xiang Li, Yaoqin Zhang, Zheng Zhang, Jinchao Li, Baolin Peng, Xiujun Li, Minlie Huang, et al. 2019. Convlab: Multi-domain end-to-end dialog system platform. arXiv preprint arXiv:1904.08637 (2019)."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1187"},{"key":"e_1_3_2_2_14_1","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Lu Xiaopeng","year":"1865","unstructured":"Xiaopeng Lu , Tiancheng Zhao , and Kyusong Lee . 2021. VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . Association for Computational Linguistics , Online , 5020--5029. https:\/\/doi.org\/10. 1865 3\/v1\/2021.acl-long.389 Xiaopeng Lu, Tiancheng Zhao, and Kyusong Lee. 2021. VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 5020--5029. https:\/\/doi.org\/10.18653\/v1\/2021.acl-long.389"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/3463952.3464054"},{"key":"e_1_3_2_2_16_1","first-page":"7613","article-title":"MAVEN: Multi-Agent Variational Exploration","volume":"32","author":"Mahajan Anuj","year":"2019","unstructured":"Anuj Mahajan , Tabish Rashid , Mikayel Samvelyan , and Shimon Whiteson . 2019 . MAVEN: Multi-Agent Variational Exploration . Advances in Neural Information Processing Systems , Vol. 32 (2019), 7613 -- 7624 . Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, and Shimon Whiteson. 2019. MAVEN: Multi-Agent Variational Exploration. Advances in Neural Information Processing Systems, Vol. 32 (2019), 7613--7624.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1147"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1152"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01251"},{"key":"e_1_3_2_2_20_1","volume-title":"Collaborative multi-agent dialogue model training via reinforcement learning. arXiv preprint arXiv:1907.05507","author":"Papangelis Alexandros","year":"2019","unstructured":"Alexandros Papangelis , Yi-Chia Wang , Piero Molino , and Gokhan Tur . 2019. Collaborative multi-agent dialogue model training via reinforcement learning. arXiv preprint arXiv:1907.05507 ( 2019 ). Alexandros Papangelis, Yi-Chia Wang, Piero Molino, and Gokhan Tur. 2019. Collaborative multi-agent dialogue model training via reinforcement learning. arXiv preprint arXiv:1907.05507 (2019)."},{"key":"e_1_3_2_2_21_1","unstructured":"Dragomir R Radev Hong Qi Harris Wu and Weiguo Fan. 2002. Evaluating Web-based Question Answering Systems.. In LREC. Citeseer.  Dragomir R Radev Hong Qi Harris Wu and Weiguo Fan. 2002. Evaluating Web-based Question Answering Systems.. In LREC. Citeseer."},{"key":"e_1_3_2_2_22_1","volume-title":"International Conference on Machine Learning. PMLR, 4295--4304","author":"Rashid Tabish","year":"2018","unstructured":"Tabish Rashid , Mikayel Samvelyan , Christian Schroeder , Gregory Farquhar , Jakob Foerster , and Shimon Whiteson . 2018 . Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning . In International Conference on Machine Learning. PMLR, 4295--4304 . Tabish Rashid, Mikayel Samvelyan, Christian Schroeder, Gregory Farquhar, Jakob Foerster, and Shimon Whiteson. 2018. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning. PMLR, 4295--4304."},{"key":"e_1_3_2_2_23_1","volume-title":"Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren , Kaiming He , Ross Girshick , and Jian Sun . 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems , Vol. 28 ( 2015 ), 91--99. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015), 91--99."},{"key":"e_1_3_2_2_24_1","volume-title":"Proceedings of the 2021 Conference of the North American","author":"Schneider Florian","year":"2021","unstructured":"Florian Schneider , \u00d6zge Ala\u00e7am , Xintong Wang , and Chris Biemann . 2021. Towards Multi-Modal Text-Image Retrieval to improve Human Reading . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics , Online . https:\/\/aclanthology.org\/ 2021 .naacl-srw.21 Florian Schneider, \u00d6zge Ala\u00e7am, Xintong Wang, and Chris Biemann. 2021. Towards Multi-Modal Text-Image Retrieval to improve Human Reading. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Online. https:\/\/aclanthology.org\/2021.naacl-srw.21"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1206"},{"key":"e_1_3_2_2_26_1","volume-title":"Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al.","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris J Maddison , Arthur Guez , Laurent Sifre , George Van Den Driessche , Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016 . Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529 , 7587 (2016), 484--489. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529, 7587 (2016), 484--489."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"David Silver Julian Schrittwieser Karen Simonyan Ioannis Antonoglou Aja Huang Arthur Guez Thomas Hubert Lucas Baker Matthew Lai Adrian Bolton etal 2017. Mastering the game of go without human knowledge. nature Vol. 550 7676 (2017) 354--359.  David Silver Julian Schrittwieser Karen Simonyan Ioannis Antonoglou Aja Huang Arthur Guez Thomas Hubert Lucas Baker Matthew Lai Adrian Bolton et al. 2017. Mastering the game of go without human knowledge. nature Vol. 550 7676 (2017) 354--359.","DOI":"10.1038\/nature24270"},{"key":"e_1_3_2_2_28_1","volume-title":"End-To-End Memory Networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015","author":"Sukhbaatar Sainbayar","year":"2015","unstructured":"Sainbayar Sukhbaatar , Arthur Szlam , Jason Weston , and Rob Fergus . 2015 . End-To-End Memory Networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015 , December 7-12, 2015, Montreal, Quebec, Canada,, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 2440--2448. Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus. 2015. End-To-End Memory Networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada,, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 2440--2448."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.59"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.59"},{"key":"e_1_3_2_2_31_1","volume-title":"Drill-down: Interactive retrieval of complex scenes using natural language queries. In Advances in Neural Information Processing Systems.","author":"Tan Fuwen","year":"2019","unstructured":"Fuwen Tan , Paola Cascante-Bonilla , Xiaoxiao Guo , Hui Wu , Song Feng , and Vicente Ordonez . 2019 . Drill-down: Interactive retrieval of complex scenes using natural language queries. In Advances in Neural Information Processing Systems. Fuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, and Vicente Ordonez. 2019. Drill-down: Interactive retrieval of complex scenes using natural language queries. In Advances in Neural Information Processing Systems."},{"volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 1417--1427","author":"Teney Damien","key":"e_1_3_2_2_32_1","unstructured":"Damien Teney , Ehsan Abbasnejad , and Anton van den Hengel. 2021. Unshuffling data for improved generalization in visual question answering . In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 1417--1427 . Damien Teney, Ehsan Abbasnejad, and Anton van den Hengel. 2021. Unshuffling data for improved generalization in visual question answering. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 1417--1427."},{"key":"e_1_3_2_2_33_1","volume-title":"3rd International Conference on Learning Representations, ICLR","author":"Weston Jason","year":"2015","unstructured":"Jason Weston , Sumit Chopra , and Antoine Bordes . 2015. Memory Networks . In 3rd International Conference on Learning Representations, ICLR 2015 , San Diego, CA , USA, May 7-9, 2015, Conference Track Proceedings,, Yoshua Bengio and Yann LeCun (Eds .). http:\/\/arxiv.org\/abs\/1410.3916 Jason Weston, Sumit Chopra, and Antoine Bordes. 2015. Memory Networks. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings,, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1410.3916"},{"key":"e_1_3_2_2_34_1","volume-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning","author":"Williams Ronald J","year":"1992","unstructured":"Ronald J Williams . 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning , Vol. 8 , 3 ( 1992 ), 229--256. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3 (1992), 229--256."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475366"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00165"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3476969"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV51458.2022.00256"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330991"},{"key":"e_1_3_2_2_40_1","volume-title":"Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser. In Findings of the Association for Computational Linguistics: EMNLP 2021. 1839","author":"Zheng Duo","year":"2021","unstructured":"Duo Zheng , Zipeng Xu , Fandong Meng , Xiaojie Wang , Jiaan Wang , and Jie Zhou . 2021 . Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser. In Findings of the Association for Computational Linguistics: EMNLP 2021. 1839 --1851. Duo Zheng, Zipeng Xu, Fandong Meng, Xiaojie Wang, Jiaan Wang, and Jie Zhou. 2021. Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser. In Findings of the Association for Computational Linguistics: EMNLP 2021. 1839--1851."},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460426.3463647"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Lisboa Portugal","acronym":"MM '22"},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548345","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3548345","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:43Z","timestamp":1750186843000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548345"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":41,"alternative-id":["10.1145\/3503161.3548345","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3548345","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}