{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T15:16:59Z","timestamp":1778167019043,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,15]],"date-time":"2019-10-15T00:00:00Z","timestamp":1571097600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,15]]},"DOI":"10.1145\/3343031.3350935","type":"proceedings-article","created":{"date-parts":[[2019,10,21]],"date-time":"2019-10-21T16:32:26Z","timestamp":1571675546000},"page":"39-47","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":22,"title":["Vision-Language Recommendation via Attribute Augmented Multimodal Reinforcement Learning"],"prefix":"10.1145","author":[{"given":"Tong","family":"Yu","sequence":"first","affiliation":[{"name":"Samsung Research America, Mountain View, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yilin","family":"Shen","sequence":"additional","affiliation":[{"name":"Samsung Research America, Mountain View, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruiyi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Duke University, Durham, NC, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiangyu","family":"Zeng","sequence":"additional","affiliation":[{"name":"Columbia University, New York, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongxia","family":"Jin","sequence":"additional","affiliation":[{"name":"Samsung Research America, Mountain View, CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2505297"},{"key":"e_1_3_2_1_3_1","volume-title":"scripts, and information-seeking strategies: On the design of interactive information retrieval systems. Expert systems with applications, 9(3):379--395","author":"Belkin Nicholas J","year":"1995","unstructured":"Nicholas J Belkin , Colleen Cool , Adelheit Stein , and Ulrich Thiel . Cases , scripts, and information-seeking strategies: On the design of interactive information retrieval systems. Expert systems with applications, 9(3):379--395 , 1995 . Nicholas J Belkin, Colleen Cool, Adelheit Stein, and Ulrich Thiel. Cases, scripts, and information-seeking strategies: On the design of interactive information retrieval systems. Expert systems with applications, 9(3):379--395, 1995."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1886063.1886114"},{"key":"e_1_3_2_1_5_1","first-page":"2249","volume-title":"Advances in neural information processing systems","author":"Chapelle Olivier","year":"2011","unstructured":"Olivier Chapelle and Lihong Li . An empirical evaluation of thompson sampling . In Advances in neural information processing systems , pages 2249 -- 2257 , 2011 . Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sampling. In Advances in neural information processing systems, pages 2249--2257, 2011."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-015-2916-7"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939746"},{"key":"e_1_3_2_1_9_1","volume-title":"TPAMI","author":"Das Abhishek","year":"2018","unstructured":"Abhishek Das , Satwik Kottur , Khushi Gupta , Avi Singh , Deshraj Yadav , Stefan Lee , Jos\u00e9 Moura , Devi Parikh , and Dhruv Batra . Visual dialog . TPAMI , 2018 . Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, Stefan Lee, Jos\u00e9 Moura, Devi Parikh, and Dhruv Batra. Visual dialog. TPAMI, 2018."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.321"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.475"},{"key":"e_1_3_2_1_12_1","volume-title":"Openai baselines. https:\/\/github.com\/openai\/baselines","author":"Dhariwal Prafulla","year":"2017","unstructured":"Prafulla Dhariwal , Christopher Hesse , Oleg Klimov , Alex Nichol , Matthias Plappert , Alec Radford , John Schulman , Szymon Sidor , Yuhuai Wu , and Peter Zhokhov . Openai baselines. https:\/\/github.com\/openai\/baselines , 2017 . Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, and Peter Zhokhov. Openai baselines. https:\/\/github.com\/openai\/baselines, 2017."},{"key":"e_1_3_2_1_13_1","volume-title":"Query by image and video content: The qbic system. computer, 28(9):23--32","author":"Flickner Myron","year":"1995","unstructured":"Myron Flickner , Harpreet Sawhney , Wayne Niblack , Jonathan Ashley , Qian Huang , Byron Dom , Monika Gorkani , Jim Hafner , Denis Lee , Dragutin Petkovic , Query by image and video content: The qbic system. computer, 28(9):23--32 , 1995 . Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, et al. Query by image and video content: The qbic system. computer, 28(9):23--32, 1995."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_3_2_1_15_1","first-page":"372","volume-title":"Conference of the Italian Association for Artificial Intelligence","author":"Greco Claudio","year":"2017","unstructured":"Claudio Greco , Alessandro Suglia , Pierpaolo Basile , and Giovanni Semeraro . Converse-et-impera : Exploiting deep learning and hierarchical reinforcement learning for conversational recommender systems . In Conference of the Italian Association for Artificial Intelligence , pages 372 -- 386 . Springer , 2017 . Claudio Greco, Alessandro Suglia, Pierpaolo Basile, and Giovanni Semeraro. Converse-et-impera: Exploiting deep learning and hierarchical reinforcement learning for conversational recommender systems. In Conference of the Italian Association for Artificial Intelligence, pages 372--386. Springer, 2017."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911453"},{"key":"e_1_3_2_1_17_1","first-page":"676","volume-title":"NIPS","author":"Guo Xiaoxiao","year":"2018","unstructured":"Xiaoxiao Guo , Hui Wu , Yu Cheng , Steven Rennie , Gerald Tesauro , and Rogerio Feris . Dialog-based interactive image retrieval . In NIPS , pages 676 -- 686 . 2018 . Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, and Rogerio Feris. Dialog-based interactive image retrieval. In NIPS, pages 676--686. 2018."},{"key":"e_1_3_2_1_18_1","volume-title":"The IEEE International Conference on Computer Vision (ICCV)","author":"Girshick He Gkioxari","year":"2016","unstructured":"Gkioxari G.-Dollar P. Girshick He , K. Mask r-cnn . In The IEEE International Conference on Computer Vision (ICCV) , 2016 . Gkioxari G.-Dollar P. Girshick He, K. Mask r-cnn. In The IEEE International Conference on Computer Vision (ICCV), 2016."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.493"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964309"},{"key":"e_1_3_2_1_22_1","volume-title":"1st International Workshop on Conversational Approaches to Information Retrieval (CAIR'17)","author":"Kenter Tom","year":"2017","unstructured":"Tom Kenter and Maarten de Rijke . Attentive memory networks: Efficient machine reading for conversational search . In 1st International Workshop on Conversational Approaches to Information Retrieval (CAIR'17) . ACM, 2017 . Tom Kenter and Maarten de Rijke. Attentive memory networks: Efficient machine reading for conversational search. In 1st International Workshop on Conversational Approaches to Information Retrieval (CAIR'17). ACM, 2017."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_2_1_24_1","volume-title":"ICLR","author":"Kingma Diederik","year":"2014","unstructured":"Diederik Kingma and Jimmy Ba. Adam : A method for stochastic optimization . In ICLR , 2014 . Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2014."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.44"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-50077-5_5"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354723"},{"key":"e_1_3_2_1_28_1","first-page":"767","volume-title":"International Conference on Machine Learning","author":"Kveton Branislav","year":"2015","unstructured":"Branislav Kveton , Csaba Szepesvari , Zheng Wen , and Azin Ashkan . Cascading bandits : Learning to rank in the cascade model . In International Conference on Machine Learning , pages 767 -- 776 , 2015 . Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azin Ashkan. Cascading bandits: Learning to rank in the cascade model. In International Conference on Machine Learning, pages 767--776, 2015."},{"key":"e_1_3_2_1_29_1","first-page":"661","volume-title":"WWW","author":"Li Lihong","year":"2010","unstructured":"Lihong Li , Wei Chu , John Langford , and Robert E Schapire . A contextual-bandit approach to personalized news article recommendation . In WWW , pages 661 -- 670 . ACM, 2010 . Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In WWW, pages 661--670. ACM, 2010."},{"key":"e_1_3_2_1_30_1","first-page":"9725","volume-title":"Advances in Neural Information Processing Systems","author":"Li Raymond","year":"2018","unstructured":"Raymond Li , Samira Ebrahimi Kahou , Hannes Schulz , Vincent Michalski , Laurent Charlin , and Chris Pal . Towards deep conversational recommendations . In Advances in Neural Information Processing Systems , pages 9725 -- 9735 , 2018 . Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. Towards deep conversational recommendations. In Advances in Neural Information Processing Systems, pages 9725--9735, 2018."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.551"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.433"},{"key":"e_1_3_2_1_33_1","volume-title":"ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision -","author":"Fei-Fei Olga Li","year":"2008","unstructured":"Li Fei-Fei Olga Russakovsky. Learning visual attributes . In ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I , 2008 . Li Fei-Fei Olga Russakovsky. Learning visual attributes. In ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I, 2008."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126281"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020165.3020183"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2017.137"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.131"},{"key":"e_1_3_2_1_38_1","volume-title":"Image retrieval: Current techniques, promising directions, and open issues. Journal of visual communication and image representation, 10(1):39--62","author":"Rui Yong","year":"1999","unstructured":"Yong Rui , Thomas S Huang , and Shih-Fu Chang . Image retrieval: Current techniques, promising directions, and open issues. Journal of visual communication and image representation, 10(1):39--62 , 1999 . Yong Rui, Thomas S Huang, and Shih-Fu Chang. Image retrieval: Current techniques, promising directions, and open issues. Journal of visual communication and image representation, 10(1):39--62, 1999."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/76.718510"},{"key":"e_1_3_2_1_40_1","first-page":"3719","volume-title":"NIPS","author":"Seo Paul Hongsuck","year":"2017","unstructured":"Paul Hongsuck Seo , Andreas Lehrmann , Bohyung Han , and Leonid Sigal . Visual reference resolution using attention memory for visual dialog . In NIPS , pages 3719 -- 3729 , 2017 . Paul Hongsuck Seo, Andreas Lehrmann, Bohyung Han, and Leonid Sigal. Visual reference resolution using attention memory for visual dialog. In NIPS, pages 3719--3729, 2017."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025132.1026313"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/3172077.3172274"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210002"},{"key":"e_1_3_2_1_44_1","first-page":"1057","volume-title":"Advances in neural information processing systems","author":"Sutton Richard S","year":"2000","unstructured":"Richard S Sutton , David A McAllester , Satinder P Singh , and Yishay Mansour . Policy gradient methods for reinforcement learning with function approximation . In Advances in neural information processing systems , pages 1057 -- 1063 , 2000 . Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057--1063, 2000."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.501"},{"key":"e_1_3_2_1_46_1","first-page":"38","volume-title":"Proceedings of the ACM International Conference on Image and Video Retrieval","author":"Tellex Stefanie","unstructured":"Stefanie Tellex and Deb Roy . Towards surveillance video search by natural language query . In Proceedings of the ACM International Conference on Image and Video Retrieval , page 38 . ACM, 2009. Stefanie Tellex and Deb Roy. Towards surveillance video search by natural language query. In Proceedings of the ACM International Conference on Image and Video Retrieval, page 38. ACM, 2009."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13735-012-0014-4"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICoICT.2014.6914058"},{"key":"e_1_3_2_1_50_1","first-page":"1009","volume-title":"ICPR","volume":"2","author":"Wu Hong","year":"2004","unstructured":"Hong Wu , Hanqing Lu , and Songde Ma. Willhunter : interactive image retrieval with multilevel relevance . In ICPR , volume 2 , pages 1009 -- 1012 . IEEE, 2004 . Hong Wu, Hanqing Lu, and Songde Ma. Willhunter: interactive image retrieval with multilevel relevance. In ICPR, volume 2, pages 1009--1012. IEEE, 2004."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-50077-5_6"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.32"},{"key":"e_1_3_2_1_53_1","volume-title":"International Conference on Computer Vision (ICCV)","author":"Yu Grauman K.","unstructured":"Grauman K. Yu , A. Semantic jitter : Dense supervision for visual comparisons via synthetic images . In International Conference on Computer Vision (ICCV) , 20147. Grauman K. Yu, A. Semantic jitter: Dense supervision for visual comparisons via synthetic images. In International Conference on Computer Vision (ICCV), 20147."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2016.2603342"}],"event":{"name":"MM '19: The 27th ACM International Conference on Multimedia","location":"Nice France","acronym":"MM '19","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 27th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3350935","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3343031.3350935","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:13:17Z","timestamp":1750201997000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3350935"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,15]]},"references-count":54,"alternative-id":["10.1145\/3343031.3350935","10.1145\/3343031"],"URL":"https:\/\/doi.org\/10.1145\/3343031.3350935","relation":{},"subject":[],"published":{"date-parts":[[2019,10,15]]},"assertion":[{"value":"2019-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}