{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T21:55:15Z","timestamp":1775253315032,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":48,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,10,15]],"date-time":"2018-10-15T00:00:00Z","timestamp":1539561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61771457 61732007 61532018 61332016 61620106009 61672497 U"],"award-info":[{"award-number":["61771457 61732007 61532018 61332016 61620106009 61672497 U"]}]},{"name":"National Basic Research Program of China","award":["2015CB351800"],"award-info":[{"award-number":["2015CB351800"]}]},{"name":"Key Research Program of Frontier Sciences","award":["CAS: QYZDJ-SSW-SYS013"],"award-info":[{"award-number":["CAS: QYZDJ-SSW-SYS013"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,10,15]]},"DOI":"10.1145\/3240508.3240649","type":"proceedings-article","created":{"date-parts":[[2018,10,18]],"date-time":"2018-10-18T13:52:08Z","timestamp":1539870728000},"page":"1092-1100","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":32,"title":["Attentive Recurrent Neural Network for Weak-supervised Multi-label Image Classification"],"prefix":"10.1145","author":[{"given":"Liang","family":"Li","sequence":"first","affiliation":[{"name":"Chinese Academy of Sciences, Beijing, China"}]},{"given":"Shuhui","family":"Wang","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences, Beijing, China"}]},{"given":"Shuqiang","family":"Jiang","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Qingming","family":"Huang","sequence":"additional","affiliation":[{"name":"Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2018,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755","author":"Ba Jimmy","year":"2014","unstructured":"Jimmy Ba , Volodymyr Mnih , and Koray Kavukcuoglu . 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 ( 2014 ). Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014)."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.104"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2463223"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1646396.1646452"},{"key":"e_1_3_2_1_5_1","volume-title":"NIPS Workshop .","author":"Collobert Ronan","year":"2011","unstructured":"Ronan Collobert , Koray Kavukcuoglu , and Cl\u00e9ment Farabet . 2011 . Torch7: A matlab-like environment for machine learning. In BigLearn , NIPS Workshop . Ronan Collobert, Koray Kavukcuoglu, and Cl\u00e9ment Farabet. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop ."},{"key":"e_1_3_2_1_6_1","volume-title":"Imagenet: A large-scale hierarchical image database","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database . In CVPR. IEEE , 248--255. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. IEEE, 248--255."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_3_2_1_8_1","volume-title":"Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894","author":"Gong Yunchao","year":"2013","unstructured":"Yunchao Gong , Yangqing Jia , Thomas Leung , Alexander Toshev , and Sergey Ioffe . 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 ( 2013 ). Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, and Sergey Ioffe. 2013. Deep convolutional ranking for multilabel image annotation. arXiv preprint arXiv:1312.4894 (2013)."},{"key":"e_1_3_2_1_9_1","volume-title":"Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation","author":"Guillaumin Matthieu","year":"2009","unstructured":"Matthieu Guillaumin , Thomas Mensink , Jakob Verbeek , and Cordelia Schmid . 2009 . Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation . In ICCV. IEEE , 309--316. Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. 2009. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In ICCV. IEEE, 309--316."},{"key":"e_1_3_2_1_10_1","volume-title":"Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385","author":"He Kaiming","year":"2015","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 ( 2015 ). Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Hexiang Hu Guang-Tong Zhou Zhiwei Deng Zicheng Liao and Greg Mori. 2016. Learning structured inference neural networks with label relations. In CVPR . 2960--2968. Hexiang Hu Guang-Tong Zhou Zhiwei Deng Zicheng Liao and Greg Mori. 2016. Learning structured inference neural networks with label relations. In CVPR . 2960--2968.","DOI":"10.1109\/CVPR.2016.323"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123403"},{"key":"e_1_3_2_1_13_1","volume-title":"Active Convolution: Learning the Shape of Convolution for Image Classification. In CVPR .","author":"Jeon Yunho","year":"2017","unstructured":"Yunho Jeon and Junmo Kim . 2017 . Active Convolution: Learning the Shape of Convolution for Image Classification. In CVPR . Yunho Jeon and Junmo Kim. 2017. Active Convolution: Learning the Shape of Convolution for Image Classification. In CVPR ."},{"key":"e_1_3_2_1_14_1","volume-title":"Annotation order matters: Recurrent image annotator for arbitrary length image tagging. arXiv preprint arXiv:1604.05225","author":"Jin Jiren","year":"2016","unstructured":"Jiren Jin and Hideki Nakayama . 2016. Annotation order matters: Recurrent image annotator for arbitrary length image tagging. arXiv preprint arXiv:1604.05225 ( 2016 ). Jiren Jin and Hideki Nakayama. 2016. Annotation order matters: Recurrent image annotator for arbitrary length image tagging. arXiv preprint arXiv:1604.05225 (2016)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.31"},{"key":"e_1_3_2_1_16_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik","year":"2014","unstructured":"Diederik Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2545667"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2751607"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Qiang Li Maoying Qiao Wei Bian and Dacheng Tao. 2016. Conditional graphical lasso for multi-label image classification. In CVPR . 2977--2986. Qiang Li Maoying Qiao Wei Bian and Dacheng Tao. 2016. Conditional graphical lasso for multi-label image classification. In CVPR . 2977--2986.","DOI":"10.1109\/CVPR.2016.325"},{"key":"e_1_3_2_1_20_1","unstructured":"Yunsheng Li Mandar Dixit and Nuno Vasconcelos. 2017a. Deep Scene Image Classification With the MFAFVNet. In ICCV . Yunsheng Li Mandar Dixit and Nuno Vasconcelos. 2017a. Deep Scene Image Classification With the MFAFVNet. In ICCV ."},{"key":"e_1_3_2_1_21_1","unstructured":"Yuncheng Li Yale Song and Jiebo Luo. 2017b. Improving Pairwise Ranking for Multi-label Image Classification. In CVPR . Yuncheng Li Yale Song and Jiebo Luo. 2017b. Improving Pairwise Ranking for Multi-label Image Classification. In CVPR ."},{"key":"e_1_3_2_1_22_1","volume-title":"Microsoft coco: Common objects in context","author":"Lin Tsung-Yi","unstructured":"Tsung-Yi Lin , Michael Maire , Serge Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1r , and C Lawrence Zitnick . 2014. Microsoft coco: Common objects in context . In ECCV. Springer , 740--755. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755."},{"key":"e_1_3_2_1_23_1","unstructured":"Volodymyr Mnih Nicolas Heess Alex Graves and koray kavukcuoglu. 2014. Recurrent Models of Visual Attention. In Advances in Neural Information Processing Systems 27. 2204--2212. Volodymyr Mnih Nicolas Heess Alex Graves and koray kavukcuoglu. 2014. Recurrent Models of Visual Attention. In Advances in Neural Information Processing Systems 27. 2204--2212."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2671188.2749391"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.222"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"M. Oquab L. Bottou I. Laptev and J. Sivic. 2015. Is object localization for free? -- Weakly-supervised learning with convolutional neural networks. In CVPR . M. Oquab L. Bottou I. Laptev and J. Sivic. 2015. Is object localization for free? -- Weakly-supervised learning with convolutional neural networks. In CVPR .","DOI":"10.1109\/CVPR.2015.7298668"},{"key":"e_1_3_2_1_27_1","volume-title":"Topic regression multi-modal latent dirichlet allocation for image annotation","author":"Putthividhy Duangmanee","unstructured":"Duangmanee Putthividhy , Hagai T Attias , and Srikantan S Nagarajan . 2010. Topic regression multi-modal latent dirichlet allocation for image annotation . In CVPR. IEEE , 3408--3415. Duangmanee Putthividhy, Hagai T Attias, and Srikantan S Nagarajan. 2010. Topic regression multi-modal latent dirichlet allocation for image annotation. In CVPR. IEEE, 3408--3415."},{"key":"e_1_3_2_1_28_1","volume-title":"The dynamic representation of scenes. Visual cognition","author":"Rensink Ronald A","year":"2000","unstructured":"Ronald A Rensink . 2000. The dynamic representation of scenes. Visual cognition , Vol. 7 , 1--3 ( 2000 ), 17--42. Ronald A Rensink. 2000. The dynamic representation of scenes. Visual cognition , Vol. 7, 1--3 (2000), 17--42."},{"key":"e_1_3_2_1_29_1","volume-title":"Juan Jos\u00e9 Del Coz, and Eyke H\u00fcllermeier","author":"Senge Robin","year":"2014","unstructured":"Robin Senge , Juan Jos\u00e9 Del Coz, and Eyke H\u00fcllermeier . 2014 . On the problem of error propagation in classifier chains for multi-label classification. In Data Analysis, Machine Learning and Knowledge Discovery. Springer , 163--170. Robin Senge, Juan Jos\u00e9 Del Coz, and Eyke H\u00fcllermeier. 2014. On the problem of error propagation in classifier chains for multi-label classification. In Data Analysis, Machine Learning and Knowledge Discovery. Springer, 163--170."},{"key":"e_1_3_2_1_30_1","volume-title":"Max-Correlation Objectives, and Correntropy Loss for Multilabel Image Classification","author":"Shi Weiwei","year":"2017","unstructured":"Weiwei Shi , Yihong Gong , Xiaoyu Tao , and Nanning Zheng . 2017. Training DCNN by Combining Max-Margin , Max-Correlation Objectives, and Correntropy Loss for Multilabel Image Classification . IEEE TNNLS ( 2017 ). Weiwei Shi, Yihong Gong, Xiaoyu Tao, and Nanning Zheng. 2017. Training DCNN by Combining Max-Margin, Max-Correlation Objectives, and Correntropy Loss for Multilabel Image Classification. IEEE TNNLS (2017)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2014.2298978"},{"key":"e_1_3_2_1_32_1","first-page":"1057","article-title":"Policy gradient methods for reinforcement learning with function approximation","volume":"99","author":"Sutton Richard S","year":"1999","unstructured":"Richard S Sutton , David A McAllester , Satinder P Singh , Yishay Mansour , 1999 . Policy gradient methods for reinforcement learning with function approximation .. In NIPS , Vol. 99. 1057 -- 1063 . Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et almbox. 1999. Policy gradient methods for reinforcement learning with function approximation.. In NIPS , Vol. 99. 1057--1063.","journal-title":"NIPS"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2015.134"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33712-3_60"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/985692.985733"},{"key":"e_1_3_2_1_36_1","volume-title":"CNN-RNN: A Unified Framework for Multi-label Image Classification. arXiv preprint arXiv:1604.04573","author":"Wang Jiang","year":"2016","unstructured":"Jiang Wang , Yi Yang , Junhua Mao , Zhiheng Huang , Chang Huang , and Wei Xu. 2016. CNN-RNN: A Unified Framework for Multi-label Image Classification. arXiv preprint arXiv:1604.04573 ( 2016 ). Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. 2016. CNN-RNN: A Unified Framework for Multi-label Image Classification. arXiv preprint arXiv:1604.04573 (2016)."},{"key":"e_1_3_2_1_37_1","volume-title":"CNN: Single-label to multi-label. arXiv preprint arXiv:1406.5726","author":"Wei Yunchao","year":"2014","unstructured":"Yunchao Wei , Wei Xia , Junshi Huang , Bingbing Ni , Jian Dong , Yao Zhao , and Shuicheng Yan . 2014 . CNN: Single-label to multi-label. arXiv preprint arXiv:1406.5726 (2014). Yunchao Wei, Wei Xia, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao, and Shuicheng Yan. 2014. CNN: Single-label to multi-label. arXiv preprint arXiv:1406.5726 (2014)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Tianjun Xiao Yichong Xu Kuiyuan Yang Jiaxing Zhang Yuxin Peng and Zheng Zhang. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR. 842--850. Tianjun Xiao Yichong Xu Kuiyuan Yang Jiaxing Zhang Yuxin Peng and Zheng Zhang. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In CVPR. 842--850.","DOI":"10.1109\/CVPR.2015.7298685"},{"key":"e_1_3_2_1_40_1","volume-title":"Xing","author":"Xie Pengtao","year":"2017","unstructured":"Pengtao Xie , Ruslan Salakhutdinov , Luntian Mou , and Eric P . Xing . 2017 . Deep Determinantal Point Process for Large-Scale Multi-Label Classification. In ICCV . Pengtao Xie, Ruslan Salakhutdinov, Luntian Mou, and Eric P. Xing. 2017. Deep Determinantal Point Process for Large-Scale Multi-Label Classification. In ICCV ."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126300"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Geng Yan Yang Wang and Zicheng Liao. 2016. LSTM for Image Annotation with Relative Visual Importance. In BMVC . Geng Yan Yang Wang and Zicheng Liao. 2016. LSTM for Image Annotation with Relative Visual Importance. In BMVC .","DOI":"10.5244\/C.30.78"},{"key":"e_1_3_2_1_43_1","volume-title":"Yu Zhang, Bin-Bin Gao, Jianxin Wu, and Jianfei Cai.","author":"Yang Hao","year":"2016","unstructured":"Hao Yang , Joey Tianyi Zhou , Yu Zhang, Bin-Bin Gao, Jianxin Wu, and Jianfei Cai. 2016 b. Exploit bounding box annotations for multi-label object recognition. In CVPR . 280--288. Hao Yang, Joey Tianyi Zhou, Yu Zhang, Bin-Bin Gao, Jianxin Wu, and Jianfei Cai. 2016b. Exploit bounding box annotations for multi-label object recognition. In CVPR . 280--288."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Zichao Yang Xiaodong He Jianfeng Gao Li Deng and Alex Smola. 2016a. Stacked attention networks for image question answering. In CVPR. 21--29. Zichao Yang Xiaodong He Jianfeng Gao Li Deng and Alex Smola. 2016a. Stacked attention networks for image question answering. In CVPR. 21--29.","DOI":"10.1109\/CVPR.2016.10"},{"key":"e_1_3_2_1_45_1","volume-title":"Image captioning with semantic attention. arXiv preprint arXiv:1603.03925","author":"You Quanzeng","year":"2016","unstructured":"Quanzeng You , Hailin Jin , Zhaowen Wang , Chen Fang , and Jiebo Luo . 2016. Image captioning with semantic attention. arXiv preprint arXiv:1603.03925 ( 2016 ). Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. arXiv preprint arXiv:1603.03925 (2016)."},{"key":"e_1_3_2_1_46_1","volume-title":"Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122","author":"Yu Fisher","year":"2015","unstructured":"Fisher Yu and Vladlen Koltun . 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 ( 2015 ). Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)."},{"key":"e_1_3_2_1_47_1","volume-title":"Deconvolutional networks","author":"Zeiler Matthew D","unstructured":"Matthew D Zeiler , Dilip Krishnan , Graham W Taylor , and Rob Fergus . 2010. Deconvolutional networks . In CVPR. IEEE , 2528--2535. Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. 2010. Deconvolutional networks. In CVPR. IEEE, 2528--2535."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Feng Zhu Hongsheng Li Wanli Ouyang Nenghai Yu and Xiaogang Wang. 2017. Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification. In CVPR . Feng Zhu Hongsheng Li Wanli Ouyang Nenghai Yu and Xiaogang Wang. 2017. Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification. In CVPR .","DOI":"10.1109\/CVPR.2017.219"}],"event":{"name":"MM '18: ACM Multimedia Conference","location":"Seoul Republic of Korea","acronym":"MM '18","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 26th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240649","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3240508.3240649","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T20:40:37Z","timestamp":1775248837000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240649"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,15]]},"references-count":48,"alternative-id":["10.1145\/3240508.3240649","10.1145\/3240508"],"URL":"https:\/\/doi.org\/10.1145\/3240508.3240649","relation":{},"subject":[],"published":{"date-parts":[[2018,10,15]]},"assertion":[{"value":"2018-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}