{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T17:48:42Z","timestamp":1772905722141,"version":"3.50.1"},"reference-count":94,"publisher":"Association for Computing Machinery (ACM)","issue":"2s","license":[{"start":{"date-parts":[[2021,6,14]],"date-time":"2021-06-14T00:00:00Z","timestamp":1623628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Key Research Program of Frontier Sciences, CAS","award":["ZDBS-LY-JSC038"],"award-info":[{"award-number":["ZDBS-LY-JSC038"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61807033"],"award-info":[{"award-number":["61807033"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Program of China","award":["2017YFB0801900"],"award-info":[{"award-number":["2017YFB0801900"]}]},{"DOI":"10.13039\/501100004739","name":"Youth Innovation Promotion Association, CAS","doi-asserted-by":"crossref","award":["2020111"],"award-info":[{"award-number":["2020111"]}],"id":[{"id":"10.13039\/501100004739","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Outstanding Youth Scientist Project of ISCAS"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2021,6,21]]},"abstract":"<jats:p>Weakly supervised object detection (WSOD), aiming to detect objects with only image-level annotations, has become one of the research hotspots over the past few years. Recently, much effort has been devoted to WSOD for the simple yet effective architecture and remarkable improvements have been achieved. Existing approaches using multiple-instance learning usually pay more attention to the proposals individually, ignoring relation information between proposals. Besides, to obtain pseudo-ground-truth boxes for WSOD, MIL-based methods tend to select the region with the highest confidence score and regard those with small overlap as background category, which leads to mislabeled instances. As a result, these methods suffer from mislabeling instances and lacking relations between proposals, degrading the performance of WSOD. To tackle these issues, this article introduces a multi-peak graph-based model for WSOD. Specifically, we use the instance graph to model the relations between proposals, which reinforces multiple-instance learning process. In addition, a multi-peak discovery strategy is designed to avert mislabeling instances. The proposed model is trained by stochastic gradients decent optimizer using back-propagation in an end-to-end manner. Extensive quantitative and qualitative evaluations on two publicly challenging benchmarks, PASCAL VOC 2007 and PASCAL VOC 2012, demonstrate the superiority and effectiveness of the proposed approach.<\/jats:p>","DOI":"10.1145\/3432861","type":"journal-article","created":{"date-parts":[[2021,6,14]],"date-time":"2021-06-14T12:55:42Z","timestamp":1623675342000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Multi-peak Graph-based Multi-instance Learning for Weakly Supervised Object Detection"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8918-0981","authenticated-orcid":false,"given":"Ruyi","family":"Ji","sequence":"first","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zeyu","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Automation, China University of Petroleum, Beijing, Changping, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Libo","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianwei","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Automation, China University of Petroleum, Beijing, Changping, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xin","family":"Zuo","sequence":"additional","affiliation":[{"name":"Department of Automation, China University of Petroleum, Beijing, Changping, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanjun","family":"Wu","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chen","family":"Zhao","sequence":"additional","affiliation":[{"name":"Institute of Software, Chinese Academy of Sciences, Zhong Guan Cun, Haidian, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haofeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing Institute of Computer Technology and Applications, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lin","family":"Yang","sequence":"additional","affiliation":[{"name":"Beijing Institute of Computer Technology and Applications, Beijing, China, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,14]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00002"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Aditya Arun C. V. Jawahar and M. Pawan Kumar. 2018. Dissimilarity coefficient-based weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1811.10016.  Aditya Arun C. V. Jawahar and M. Pawan Kumar. 2018. Dissimilarity coefficient-based weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1811.10016.","DOI":"10.1109\/CVPR.2019.00966"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.194"},{"key":"e_1_2_1_4_1","unstructured":"Hakan Bilen and Andrea Vedaldi. 2015. Weakly supervised deep detection networks. Retrieved from http:\/\/arxiv.org\/abs\/1511.02853.  Hakan Bilen and Andrea Vedaldi. 2015. Weakly supervised deep detection networks. Retrieved from http:\/\/arxiv.org\/abs\/1511.02853."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.)","volume":"80","author":"Bojchevski Aleksandar","year":"2018","unstructured":"Aleksandar Bojchevski , Oleksandr Shchur , Daniel Z\u00fcgner , and Stephan G\u00fcnnemann . 2018 . NetGAN: Generating graphs via random walks . In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.) , Vol. 80 . PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, 610\u2013619. Retrieved from http:\/\/proceedings.mlr.press\/v80\/bojchevski18a.html. Aleksandar Bojchevski, Oleksandr Shchur, Daniel Z\u00fcgner, and Stephan G\u00fcnnemann. 2018. NetGAN: Generating graphs via random walks. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, 610\u2013619. Retrieved from http:\/\/proceedings.mlr.press\/v80\/bojchevski18a.html."},{"key":"e_1_2_1_6_1","unstructured":"Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. Retrieved from https:\/\/abs\/1805.11973.  Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. Retrieved from https:\/\/abs\/1805.11973."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Neelima Chavali Harsh Agrawal Aroma Mahendru and Dhruv Batra. 2015. Object-proposal evaluation protocol is \u201cgameable.\u201d Retrieved from http:\/\/arxiv.org\/abs\/1505.05836.  Neelima Chavali Harsh Agrawal Aroma Mahendru and Dhruv Batra. 2015. Object-proposal evaluation protocol is \u201cgameable.\u201d Retrieved from http:\/\/arxiv.org\/abs\/1505.05836.","DOI":"10.1109\/CVPR.2016.97"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.440"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u20138.","author":"Chum O.","unstructured":"O. Chum and A. Zisserman . 2007. An exemplar model for learning object classes . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u20138. O. Chum and A. Zisserman. 2007. An exemplar model for learning object classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u20138."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2535231"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157527"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-012-0538-3"},{"key":"e_1_2_1_13_1","volume-title":"Hamed Pirsiavash, and Luc Van Gool.","author":"Diba Ali","year":"2016","unstructured":"Ali Diba , Vivek Sharma , Ali Mohammad Pazandeh , Hamed Pirsiavash, and Luc Van Gool. 2016 . Weakly supervised cascaded convolutional networks. Retrieved from http:\/\/arxiv.org\/abs\/1611.08258. Ali Diba, Vivek Sharma, Ali Mohammad Pazandeh, Hamed Pirsiavash, and Luc Van Gool. 2016. Weakly supervised cascaded convolutional networks. Retrieved from http:\/\/arxiv.org\/abs\/1611.08258."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(96)00034-3"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1271\u20131278","author":"Divvala S. K.","unstructured":"S. K. Divvala , D. Hoiem , J. H. Hays , A. A. Efros , and M. Hebert . 2009. An empirical study of context in object detection . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1271\u20131278 . S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, and M. Hebert. 2009. An empirical study of context in object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1271\u20131278."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-014-0733-5"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/11736790_8"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/11736790_8"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. 768\u2013769","author":"Zeni Luis Felipe","unstructured":"Luis Felipe Zeni and Claudio R. Jung . 2020. Distilling knowledge from refinement in multiple-instance detection networks . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. 768\u2013769 . Luis Felipe Zeni and Claudio R. Jung. 2020. Distilling knowledge from refinement in multiple-instance detection networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. 768\u2013769."},{"key":"e_1_2_1_20_1","volume-title":"Berg","author":"Fu Cheng-Yang","year":"2017","unstructured":"Cheng-Yang Fu , Wei Liu , Ananth Ranga , Ambrish Tyagi , and Alexander C . Berg . 2017 . DSSD : Deconvolutional single-shot detector. Retrieved from http:\/\/arxiv.org\/abs\/1701.06659. Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg. 2017. DSSD: Deconvolutional single-shot detector. Retrieved from http:\/\/arxiv.org\/abs\/1701.06659."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2010.02.004"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u20138.","author":"Galleguillos C.","unstructured":"C. Galleguillos , A. Rabinovich , and S. Belongie . 2008. Object categorization using co-occurrence, location and appearance . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u20138. C. Galleguillos, A. Rabinovich, and S. Belongie. 2008. Object categorization using co-occurrence, location and appearance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1\u20138."},{"key":"e_1_2_1_23_1","volume-title":"Davis","author":"Gao Mingfei","year":"2017","unstructured":"Mingfei Gao , Ang Li , Ruichi Yu , Vlad I. Morariu , and Larry S . Davis . 2017 . C-WSL: Count-guided weakly supervised localization. Retrieved from http:\/\/arxiv.org\/abs\/1711.05282. Mingfei Gao, Ang Li, Ruichi Yu, Vlad I. Morariu, and Larry S. Davis. 2017. C-WSL: Count-guided weakly supervised localization. Retrieved from http:\/\/arxiv.org\/abs\/1711.05282."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305381.3305512"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/553897"},{"key":"e_1_2_1_28_1","volume-title":"Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)","author":"He K.","unstructured":"K. He , G. Gkioxari , P. Doll\u00e1r , and R. Girshick . 2017 . Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917) . 2980\u20132988. K. He, G. Gkioxari, P. Doll\u00e1r, and R. Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917). 2980\u20132988."},{"key":"e_1_2_1_29_1","unstructured":"Kaiming He Georgia Gkioxari Piotr Doll\u00e1r and Ross Girshick. 2017. Mask R-CNN. Retrieved from http:\/\/arxiv.org\/abs\/1703.06870.  Kaiming He Georgia Gkioxari Piotr Doll\u00e1r and Ross Girshick. 2017. Mask R-CNN. Retrieved from http:\/\/arxiv.org\/abs\/1703.06870."},{"key":"e_1_2_1_30_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2015. Deep residual learning for image recognition. Retrieved from http:\/\/arxiv.org\/abs\/1512.03385.  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2015. Deep residual learning for image recognition. Retrieved from http:\/\/arxiv.org\/abs\/1512.03385."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.28.24"},{"key":"e_1_2_1_32_1","unstructured":"Jan Hendrik Hosang Rodrigo Benenson Piotr Doll\u00e1r and Bernt Schiele. 2015. What makes for effective detection proposals? Retrieved from http:\/\/arxiv.org\/abs\/1502.05082.  Jan Hendrik Hosang Rodrigo Benenson Piotr Doll\u00e1r and Bernt Schiele. 2015. What makes for effective detection proposals? Retrieved from http:\/\/arxiv.org\/abs\/1502.05082."},{"key":"e_1_2_1_33_1","unstructured":"Han Hu Jiayuan Gu Zheng Zhang Jifeng Dai and Yichen Wei. 2017. Relation networks for object detection. Retrieved from http:\/\/arxiv.org\/abs\/1711.11575.  Han Hu Jiayuan Gu Zheng Zhang Jifeng Dai and Yichen Wei. 2017. Relation networks for object detection. Retrieved from http:\/\/arxiv.org\/abs\/1711.11575."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.)","volume":"80","author":"Ilse Maximilian","year":"2018","unstructured":"Maximilian Ilse , Jakub Tomczak , and Max Welling . 2018 . Attention-based deep multiple-instance learning . In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.) , Vol. 80 . PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, 2127\u20132136. Retrieved from http:\/\/proceedings.mlr.press\/v80\/ilse18a.html. Maximilian Ilse, Jakub Tomczak, and Max Welling. 2018. Attention-based deep multiple-instance learning. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsm\u00e4ssan, Stockholm Sweden, 2127\u20132136. Retrieved from http:\/\/proceedings.mlr.press\/v80\/ilse18a.html."},{"key":"e_1_2_1_35_1","volume-title":"Silvio Savarese, and Ashutosh Saxena.","author":"Jain Ashesh","year":"2015","unstructured":"Ashesh Jain , Amir Roshan Zamir , Silvio Savarese, and Ashutosh Saxena. 2015 . Structural-RNN: Deep learning on spatio-temporal graphs. Retrieved from http:\/\/dblp.uni-trier.de\/db\/journals\/corr\/corr1511.html#JainZSS15. Ashesh Jain, Amir Roshan Zamir, Silvio Savarese, and Ashutosh Saxena. 2015. Structural-RNN: Deep learning on spatio-temporal graphs. Retrieved from http:\/\/dblp.uni-trier.de\/db\/journals\/corr\/corr1511.html#JainZSS15."},{"key":"e_1_2_1_36_1","unstructured":"Ruyi Ji Dawei Du Libo Zhang Longyin Wen Yanjun Wu Chen Zhao Feiyue Huang and Siwei Lyu. 2019. Learning semantic neural tree for human parsing. Retrieved from http:\/\/arxiv.org\/abs\/1912.09622.  Ruyi Ji Dawei Du Libo Zhang Longyin Wen Yanjun Wu Chen Zhao Feiyue Huang and Siwei Lyu. 2019. Learning semantic neural tree for human parsing. Retrieved from http:\/\/arxiv.org\/abs\/1912.09622."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01048"},{"key":"e_1_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Zequn Jie Yunchao Wei Xiaojie Jin Jiashi Feng and Wei Liu. 2017. Deep self-taught learning for weakly supervised object localization. Retrieved from http:\/\/arxiv.org\/abs\/1704.05188.  Zequn Jie Yunchao Wei Xiaojie Jin Jiashi Feng and Wei Liu. 2017. Deep self-taught learning for weakly supervised object localization. Retrieved from http:\/\/arxiv.org\/abs\/1704.05188.","DOI":"10.1109\/CVPR.2017.457"},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Vadim Kantorov Maxime Oquab Minsu Cho and Ivan Laptev. 2016. ContextLocNet: Context-aware deep network models for weakly supervised localization. Retrieved from http:\/\/arxiv.org\/abs\/1609.04331.  Vadim Kantorov Maxime Oquab Minsu Cho and Ivan Laptev. 2016. ContextLocNet: Context-aware deep network models for weakly supervised localization. Retrieved from http:\/\/arxiv.org\/abs\/1609.04331.","DOI":"10.1007\/978-3-319-46454-1_22"},{"key":"e_1_2_1_40_1","unstructured":"Thomas Kipf and Max Welling. 2016. Variational graph auto-encoders. Retrieved from https:\/\/abs\/1611.07308.  Thomas Kipf and Max Welling. 2016. Variational graph auto-encoders. Retrieved from https:\/\/abs\/1611.07308."},{"key":"e_1_2_1_41_1","volume-title":"Kipf and Max Welling","author":"Thomas","year":"2016","unstructured":"Thomas N. Kipf and Max Welling . 2016 . Semi-supervised classification with graph convolutional networks. Retrieved from http:\/\/arxiv.org\/abs\/1609.02907. Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. Retrieved from http:\/\/arxiv.org\/abs\/1609.02907."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2019.2930913"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783380"},{"key":"e_1_2_1_44_1","unstructured":"Xiaoyan Li Meina Kan Shiguang Shan and Xilin Chen. 2019. Weakly supervised object detection with segmentation collaboration. Retrieved from http:\/\/arxiv.org\/abs\/1904.00551.  Xiaoyan Li Meina Kan Shiguang Shan and Xilin Chen. 2019. Weakly supervised object detection with segmentation collaboration. Retrieved from http:\/\/arxiv.org\/abs\/1904.00551."},{"key":"e_1_2_1_45_1","unstructured":"Yaguang Li Rose Yu Cyrus Shahabi and Yan Liu. 2017. Graph convolutional recurrent neural network: Data-driven traffic forecasting. Retrieved from http:\/\/arxiv.org\/abs\/1707.01926.  Yaguang Li Rose Yu Cyrus Shahabi and Yan Liu. 2017. Graph convolutional recurrent neural network: Data-driven traffic forecasting. Retrieved from http:\/\/arxiv.org\/abs\/1707.01926."},{"key":"e_1_2_1_46_1","unstructured":"Chenhao Lin Siwen Wang Dongqi Xu Yu Lu and Wayne Zhang. 2020. Object instance mining for weakly supervised object detection. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI\u201920) the 32nd Innovative Applications of Artificial Intelligence Conference (IAAI\u201920) and the 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI\u201920). AAAI Press 11482\u201311489. Retrieved from https:\/\/aaai.org\/ojs\/index.php\/AAAI\/article\/view\/6813.  Chenhao Lin Siwen Wang Dongqi Xu Yu Lu and Wayne Zhang. 2020. Object instance mining for weakly supervised object detection. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI\u201920) the 32nd Innovative Applications of Artificial Intelligence Conference (IAAI\u201920) and the 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI\u201920). AAAI Press 11482\u201311489. Retrieved from https:\/\/aaai.org\/ojs\/index.php\/AAAI\/article\/view\/6813."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_2_1_48_1","volume-title":"Berg","author":"Liu Wei","year":"2015","unstructured":"Wei Liu , Dragomir Anguelov , Dumitru Erhan , Christian Szegedy , Scott E. Reed , Cheng-Yang Fu , and Alexander C . Berg . 2015 . SSD : Single-shot MultiBox detector. Retrieved from http:\/\/arxiv.org\/abs\/1512.02325. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2015. SSD: Single-shot MultiBox detector. Retrieved from http:\/\/arxiv.org\/abs\/1512.02325."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018778"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/302528.302753"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915)","author":"Oquab M.","unstructured":"M. Oquab , L. Bottou , I. Laptev , and J. Sivic . 2015. Is object localization for free? - Weakly-supervised learning with convolutional neural networks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915) . 685\u2013694. M. Oquab, L. Bottou, I. Laptev, and J. Sivic. 2015. Is object localization for free? - Weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915). 685\u2013694."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.5555\/3304889.3305023"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126383"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1052"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455008"},{"key":"e_1_2_1_56_1","unstructured":"Minlong Peng and Qi Zhang. 2019. Address instance-level label prediction in multiple-instance learning. Retrieved from http:\/\/arxiv.org\/abs\/1905.12226.  Minlong Peng and Qi Zhang. 2019. Address instance-level label prediction in multiple-instance learning. Retrieved from http:\/\/arxiv.org\/abs\/1905.12226."},{"key":"e_1_2_1_57_1","volume-title":"Pinheiro and Ronan Collobert","author":"Pedro H.","year":"2014","unstructured":"Pedro H. O. Pinheiro and Ronan Collobert . 2014 . Weakly supervised semantic segmentation with convolutional networks. Retrieved from http:\/\/arxiv.org\/abs\/1411.6228. Pedro H. O. Pinheiro and Ronan Collobert. 2014. Weakly supervised semantic segmentation with convolutional networks. Retrieved from http:\/\/arxiv.org\/abs\/1411.6228."},{"key":"e_1_2_1_58_1","volume-title":"Ross B. Girshick, and Ali Farhadi.","author":"Redmon Joseph","year":"2015","unstructured":"Joseph Redmon , Santosh Kumar Divvala , Ross B. Girshick, and Ali Farhadi. 2015 . You only look once: Unified, real-time object detection. Retrieved from http:\/\/arxiv.org\/abs\/1506.02640. Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. Retrieved from http:\/\/arxiv.org\/abs\/1506.02640."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969250"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2456908"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2005605"},{"key":"e_1_2_1_63_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Shen Y.","unstructured":"Y. Shen , R. Ji , Y. Wang , Y. Wu , and L. Cao . 2019. Cyclic guidance for weakly supervised joint detection and segmentation . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919) . 697\u2013707. Y. Shen, R. Ji, Y. Wang, Y. Wu, and L. Cao. 2019. Cyclic guidance for weakly supervised joint detection and segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919). 697\u2013707."},{"key":"e_1_2_1_64_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5764\u20135773","author":"Shen Y.","unstructured":"Y. Shen , R. Ji , S. Zhang , W. Zuo , and Y. Wang . 2018. Generative adversarial learning towards fast weakly supervised detection . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5764\u20135773 . Y. Shen, R. Ji, S. Zhang, W. Zuo, and Y. Wang. 2018. Generative adversarial learning towards fast weakly supervised detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5764\u20135773."},{"key":"e_1_2_1_65_1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)","author":"Shi M.","unstructured":"M. Shi , H. Caesar , and V. Ferrari . 2017. Weakly supervised object localization using things and stuff transfer . In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917) . 3401\u20133410. M. Shi, H. Caesar, and V. Ferrari. 2017. Weakly supervised object localization using things and stuff transfer. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917). 3401\u20133410."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2392769"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1007\/11744023_1"},{"key":"e_1_2_1_68_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015 . Very deep convolutional networks for large-scale image recognition . In Proceedings of the International Conference on Learning Representations. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00964"},{"key":"e_1_2_1_70_1","volume-title":"Stefanie Jegelka, and Trevor Darrell.","author":"Song Hyun Oh","year":"2014","unstructured":"Hyun Oh Song , Yong Jae Lee , Stefanie Jegelka, and Trevor Darrell. 2014 . Weakly-supervised discovery of visual pattern configurations. Retrieved from http:\/\/arxiv.org\/abs\/1406.6507. Hyun Oh Song, Yong Jae Lee, Stefanie Jegelka, and Trevor Darrell. 2014. Weakly-supervised discovery of visual pattern configurations. Retrieved from http:\/\/arxiv.org\/abs\/1406.6507."},{"key":"e_1_2_1_71_1","volume-title":"Yuille","author":"Tang Peng","year":"2018","unstructured":"Peng Tang , Xinggang Wang , Song Bai , Wei Shen , Xiang Bai , Wenyu Liu , and Alan L . Yuille . 2018 . PCL : Proposal cluster learning for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1807.03342. Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, and Alan L. Yuille. 2018. PCL: Proposal cluster learning for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1807.03342."},{"key":"e_1_2_1_72_1","doi-asserted-by":"crossref","unstructured":"Peng Tang Xinggang Wang Xiang Bai and Wenyu Liu. 2017. Multiple-instance detection network with online instance classifier refinement. Retrieved from http:\/\/arxiv.org\/abs\/1704.00138.  Peng Tang Xinggang Wang Xiang Bai and Wenyu Liu. 2017. Multiple-instance detection network with online instance classifier refinement. Retrieved from http:\/\/arxiv.org\/abs\/1704.00138.","DOI":"10.1109\/CVPR.2017.326"},{"key":"e_1_2_1_73_1","unstructured":"Peng Tang Xinggang Wang Zilong Huang Xiang Bai and Wenyu Liu. 2017. Deep patch learning for weakly supervised object classification and discovery. Retrieved from http:\/\/arxiv.org\/abs\/1705.02429.  Peng Tang Xinggang Wang Zilong Huang Xiang Bai and Wenyu Liu. 2017. Deep patch learning for weakly supervised object classification and discovery. Retrieved from http:\/\/arxiv.org\/abs\/1705.02429."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01252-6_22"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.5555\/946247.946665"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2009.186"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0620-5"},{"key":"e_1_2_1_78_1","unstructured":"Rianne van den Berg Thomas Kipf and Max Welling. 2017. Graph convolutional matrix completion. Retrieved from https:\/\/abs\/1706.02263.  Rianne van den Berg Thomas Kipf and Max Welling. 2017. Graph convolutional matrix completion. Retrieved from https:\/\/abs\/1706.02263."},{"key":"e_1_2_1_79_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Veli\u010dkovi\u0107 Petar","year":"2018","unstructured":"Petar Veli\u010dkovi\u0107 , Guillem Cucurull , Arantxa Casanova , Adriana Romero , Pietro Li\u00f2 , and Yoshua Bengio . 2018 . Graph attention networks . In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=rJXMpikCZ. Petar Veli\u010dkovi\u0107, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li\u00f2, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=rJXMpikCZ."},{"key":"e_1_2_1_80_1","doi-asserted-by":"crossref","unstructured":"Fang Wan Chang Liu Wei Ke Xiangyang Ji Jianbin Jiao and Qixiang Ye. 2019. C-MIL: Continuation multiple-instance learning for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1904.05647.  Fang Wan Chang Liu Wei Ke Xiangyang Ji Jianbin Jiao and Qixiang Ye. 2019. C-MIL: Continuation multiple-instance learning for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1904.05647.","DOI":"10.1109\/CVPR.2019.00230"},{"key":"e_1_2_1_81_1","doi-asserted-by":"crossref","unstructured":"Fang Wan Pengxu Wei Zhenjun Han Jianbin Jiao and Qixiang Ye. 2019. Min-entropy latent model for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1902.06057.  Fang Wan Pengxu Wei Zhenjun Han Jianbin Jiao and Qixiang Ye. 2019. Min-entropy latent model for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1902.06057.","DOI":"10.1109\/CVPR.2018.00141"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10599-4_28"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00308"},{"key":"e_1_2_1_84_1","unstructured":"Xiaolong Wang Ross B. Girshick Abhinav Gupta and Kaiming He. 2017. Non-local neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1711.07971.  Xiaolong Wang Ross B. Girshick Abhinav Gupta and Kaiming He. 2017. Non-local neural networks. Retrieved from http:\/\/arxiv.org\/abs\/1711.07971."},{"key":"e_1_2_1_85_1","volume-title":"Huang","author":"Wei Yunchao","year":"2018","unstructured":"Yunchao Wei , Zhiqiang Shen , Bowen Cheng , Honghui Shi , Jinjun Xiong , Jiashi Feng , and Thomas S . Huang . 2018 . TS2C: Tight box mining with surrounding segmentation context for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1807.04897. Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi, Jinjun Xiong, Jiashi Feng, and Thomas S. Huang. 2018. TS2C: Tight box mining with surrounding segmentation context for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1807.04897."},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298968"},{"key":"e_1_2_1_87_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.)","volume":"80","author":"You Jiaxuan","year":"2018","unstructured":"Jiaxuan You , Rex Ying , Xiang Ren , William Hamilton , and Jure Leskovec . 2018 . GraphRNN: Generating realistic graphs with deep auto-regressive models . In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.) , Vol. 80 , 5708\u20135717. Retrieved from http:\/\/proceedings.mlr.press\/v80\/you18a.html. Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. 2018. GraphRNN: Generating realistic graphs with deep auto-regressive models. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80, 5708\u20135717. Retrieved from http:\/\/proceedings.mlr.press\/v80\/you18a.html."},{"key":"e_1_2_1_88_1","unstructured":"Jiani Zhang Xingjian Shi Junyuan Xie Hao Ma Irwin King and Dit-Yan Yeung. 2018. GaAN: Gated attention networks for learning on large and spatiotemporal graphs. Retrieved from http:\/\/arxiv.org\/abs\/1803.07294.  Jiani Zhang Xingjian Shi Junyuan Xie Hao Ma Irwin King and Dit-Yan Yeung. 2018. GaAN: Gated attention networks for learning on large and spatiotemporal graphs. Retrieved from http:\/\/arxiv.org\/abs\/1803.07294."},{"key":"e_1_2_1_89_1","volume-title":"Li","author":"Zhang Shifeng","year":"2017","unstructured":"Shifeng Zhang , Longyin Wen , Xiao Bian , Zhen Lei , and Stan Z . Li . 2017 . Single-shot refinement neural network for object detection. Retrieved from http:\/\/arxiv.org\/abs\/1711.06897. Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, and Stan Z. Li. 2017. Single-shot refinement neural network for object detection. Retrieved from http:\/\/arxiv.org\/abs\/1711.06897."},{"key":"e_1_2_1_90_1","doi-asserted-by":"crossref","unstructured":"Xiaopeng Zhang Jiashi Feng Hongkai Xiong and Qi Tian. 2018. Zigzag learning for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1804.09466.  Xiaopeng Zhang Jiashi Feng Hongkai Xiong and Qi Tian. 2018. Zigzag learning for weakly supervised object detection. Retrieved from http:\/\/arxiv.org\/abs\/1804.09466.","DOI":"10.1109\/CVPR.2018.00448"},{"key":"e_1_2_1_91_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 928\u2013936","author":"Zhang Y.","unstructured":"Y. Zhang , Y. Bai , M. Ding , Y. Li , and B. Ghanem . 2018. W2F: A weakly-supervised to fully-supervised framework for object detection . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 928\u2013936 . Y. Zhang, Y. Bai, M. Ding, Y. Li, and B. Ghanem. 2018. W2F: A weakly-supervised to fully-supervised framework for object detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 928\u2013936."},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553534"},{"key":"e_1_2_1_93_1","volume-title":"Proceedings of the International Conference on Intelligent Information Technology.","author":"Zhou Zhi-Hua","year":"2002","unstructured":"Zhi-Hua Zhou and Min-Ling Zhang . 2002 . Neural networks for multi-instance learning . Proceedings of the International Conference on Intelligent Information Technology. Zhi-Hua Zhou and Min-Ling Zhang. 2002. Neural networks for multi-instance learning. Proceedings of the International Conference on Intelligent Information Technology."},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_26"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3432861","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3432861","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:11Z","timestamp":1750193231000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3432861"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,14]]},"references-count":94,"journal-issue":{"issue":"2s","published-print":{"date-parts":[[2021,6,21]]}},"alternative-id":["10.1145\/3432861"],"URL":"https:\/\/doi.org\/10.1145\/3432861","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,14]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}