{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T16:19:20Z","timestamp":1761581960327,"version":"3.41.0"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"1s","license":[{"start":{"date-parts":[[2020,1,31]],"date-time":"2020-01-31T00:00:00Z","timestamp":1580428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"UK EPSRC","award":["EP\/N011074\/1"],"award-info":[{"award-number":["EP\/N011074\/1"]}]},{"name":"NSFC, China","award":["61876107 and U1803261"],"award-info":[{"award-number":["61876107 and U1803261"]}]},{"name":"973 Plan, China","award":["2015CB856004"],"award-info":[{"award-number":["2015CB856004"]}]},{"name":"European Union's Horizon 2020 research and innovation program under the Marie-Sklodowska-Curie","award":["720325"],"award-info":[{"award-number":["720325"]}]},{"name":"Royal Society-Newton Advanced Fellowship","award":["NA160342"],"award-info":[{"award-number":["NA160342"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2020,1,31]]},"abstract":"<jats:p>Human pose estimation has an important impact on a wide range of applications, from human-computer interface to surveillance and content-based video retrieval. For human pose estimation, joint obstructions and overlapping upon human bodies result in departed pose estimation. To address these problems, by integrating priors of the structure of human bodies, we present a novel structure-aware network to discreetly consider such priors during the training of the network. Typically, learning such constraints is a challenging task. Instead, we propose generative adversarial networks as our learning model in which we design two residual Multiple-Instance Learning (MIL) models with identical architecture\u2014one is used as the generator, and the other one is used as the discriminator. The discriminator task is to distinguish the actual poses from the fake ones. If the pose generator generates results that the discriminator is not able to distinguish from the real ones, then the model has successfully learned the priors. In the proposed model, the discriminator differentiates the ground-truth heatmaps from the generated ones, and later the adversarial loss back-propagates to the generator. Such procedure assists the generator to learn reasonable body configurations and is proved to be advantageous to improve the pose estimation accuracy. Meanwhile, we propose a novel function for MIL. It is an adjustable structure for both instance selection and modeling to appropriately pass the information between instances in a single bag. In the proposed residual MIL neural network, the pooling action adequately updates the instance contribution to its bag. The proposed adversarial residual multi-instance neural network that is based on pooling has been validated on two datasets for the human pose estimation task and successfully outperforms the other state-of-the-art models. The code will be made available on https:\/\/github.com\/pshams55\/AMIL.<\/jats:p>","DOI":"10.1145\/3355612","type":"journal-article","created":{"date-parts":[[2020,5,4]],"date-time":"2020-05-04T07:01:36Z","timestamp":1588575696000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["AMIL"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0263-1661","authenticated-orcid":false,"given":"Pourya","family":"Shamsolmoali","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Masoumeh","family":"Zareapoor","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huiyu","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Leicester, Leicester, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jie","family":"Yang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,4,17]]},"reference":[{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909)","author":"Andriluka M.","key":"e_1_2_1_1_1","unstructured":"M. Andriluka , S. Roth , and B. Schiele . 2009. Pictorial structures revisited: People detection and articulated pose estimation . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909) . 1014--1021. M. Andriluka, S. Roth, and B. Schiele. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909). 1014--1021."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201908)","author":"Felzenszwalb P. F.","key":"e_1_2_1_2_1","unstructured":"P. F. Felzenszwalb , D. A. McAllester , and D. Ramanan . 2008. A discriminatively trained, multiscale, deformable part model . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201908) . P. F. Felzenszwalb, D. A. McAllester, and D. Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201908)."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911)","author":"Johnson S.","key":"e_1_2_1_3_1","unstructured":"S. Johnson and M. Everingham . 2011. Learning effective human pose estimation from inaccurate annotation . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911) . 1465--1472. S. Johnson and M. Everingham. 2011. Learning effective human pose estimation from inaccurate annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911). 1465--1472."},{"volume-title":"Proceedings of the IEEE International Conference on Automatic Face 8 Gesture Recognition (FG\u201917)","author":"Belagiannis V.","key":"e_1_2_1_4_1","unstructured":"V. Belagiannis and A. Zisserman . 2017. Recurrent human pose estimation . In Proceedings of the IEEE International Conference on Automatic Face 8 Gesture Recognition (FG\u201917) . 468--475. V. Belagiannis and A. Zisserman. 2017. Recurrent human pose estimation. In Proceedings of the IEEE International Conference on Automatic Face 8 Gesture Recognition (FG\u201917). 468--475."},{"volume-title":"Proceedings of the IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 65--72","author":"Doll\u00e1r P.","key":"e_1_2_1_5_1","unstructured":"P. Doll\u00e1r , V. Rabaud , G. Cottrell , and S. Belongie . 2005. Behavior recognition via sparse spatio-temporal features . In Proceedings of the IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 65--72 . P. Doll\u00e1r, V. Rabaud, G. Cottrell, and S. Belongie. 2005. Behavior recognition via sparse spatio-temporal features. In Proceedings of the IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 65--72."},{"volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201916)","author":"Bulat A.","key":"e_1_2_1_6_1","unstructured":"A. Bulat and G. Tzimiropoulos . 2016. Human pose estimation via convolutional part heatmap regression . In Proceedings of the European Conference on Computer Vision (ECCV\u201916) . 717--732. A. Bulat and G. Tzimiropoulos. 2016. Human pose estimation via convolutional part heatmap regression. In Proceedings of the European Conference on Computer Vision (ECCV\u201916). 717--732."},{"volume-title":"Proceedings of the International Conference on Pattern Recognition (ICPR\u201904)","author":"Schuldt C.","key":"e_1_2_1_7_1","unstructured":"C. Schuldt , I. Laptev , and B. Caputo . 2004. Recognizing human actions: A local SVM approach . In Proceedings of the International Conference on Pattern Recognition (ICPR\u201904) . 3, 32--36. C. Schuldt, I. Laptev, and B. Caputo. 2004. Recognizing human actions: A local SVM approach. In Proceedings of the International Conference on Pattern Recognition (ICPR\u201904). 3, 32--36."},{"key":"e_1_2_1_8_1","first-page":"1","article-title":"SKEPRID: Pose and illumination change-resistant skeleton-based person re-identification","volume":"4","author":"Yu T.","year":"2018","unstructured":"T. Yu , H. Jin , W. T. Tan , and K. Nahrstedt . 2018 . SKEPRID: Pose and illumination change-resistant skeleton-based person re-identification . ACM Trans. Multimedia Comput. Commun. 4 , 82 (2018), 1 -- 24 . T. Yu, H. Jin, W. T. Tan, and K. Nahrstedt. 2018. SKEPRID: Pose and illumination change-resistant skeleton-based person re-identification. ACM Trans. Multimedia Comput. Commun. 4, 82 (2018), 1--24.","journal-title":"ACM Trans. Multimedia Comput. Commun."},{"key":"e_1_2_1_9_1","first-page":"1","article-title":"Spatially coherent feature learning for pose-invariant facial expression recognition","volume":"1","author":"Zhang F.","year":"2018","unstructured":"F. Zhang , Q. Mao , X. Shen , Y. Zhan , and M. Dong . 2018 . Spatially coherent feature learning for pose-invariant facial expression recognition . ACM Trans. Multimedia Comput. Commun. 1 , 27 (2018), 1 -- 19 . F. Zhang, Q. Mao, X. Shen, Y. Zhan, and M. Dong. 2018. Spatially coherent feature learning for pose-invariant facial expression recognition. ACM Trans. Multimedia Comput. Commun. 1, 27 (2018), 1--19.","journal-title":"ACM Trans. Multimedia Comput. Commun."},{"key":"e_1_2_1_10_1","first-page":"1","article-title":"Joint head attribute classifier and domain-specific refinement networks for face alignment","volume":"4","author":"Zhang J.","year":"2018","unstructured":"J. Zhang and H. Hu . 2018 . Joint head attribute classifier and domain-specific refinement networks for face alignment . ACM Trans. Multimedia Comput. Commun. 4 , 79 (2018), 1 -- 19 . J. Zhang and H. Hu. 2018. Joint head attribute classifier and domain-specific refinement networks for face alignment. ACM Trans. Multimedia Comput. Commun. 4, 79 (2018), 1--19.","journal-title":"ACM Trans. Multimedia Comput. Commun."},{"volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201916)","author":"Newell A.","key":"e_1_2_1_11_1","unstructured":"A. Newell , K. Yang , and J. Deng . 2016. Stacked hourglass networks for human pose estimation . In Proceedings of the European Conference on Computer Vision (ECCV\u201916) . 483--449. A. Newell, K. Yang, and J. Deng. 2016. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision (ECCV\u201916). 483--449."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the International Conference on Neural Information Processing Systems (NIPS\u201914)","author":"Tompson J. J.","year":"1807","unstructured":"J. J. Tompson , A. Jain , Y. LeCun , and C. Bregler . 2014. Joint training of a convolutional network and a graphical model for human pose estimation . In Proceedings of the International Conference on Neural Information Processing Systems (NIPS\u201914) . 1799-- 1807 . J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS\u201914). 1799--1807."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision (ICCV\u201917)","author":"Chen Y.","key":"e_1_2_1_13_1","unstructured":"Y. Chen , C. Shen , X. S. Wei , L. Liu , and J. Yang . 2017. Adversarial PoseNet: A structure-aware convolutional network for human pose estimation . In Proceedings of the IEEE Conference on Computer Vision (ICCV\u201917) . 1212--1221. Y. Chen, C. Shen, X. S. Wei, L. Liu, and J. Yang. 2017. Adversarial PoseNet: A structure-aware convolutional network for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision (ICCV\u201917). 1212--1221."},{"key":"e_1_2_1_14_1","unstructured":"A. Radford L. Metz and S. Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. ArXiv Preprint ArXiv 1511.06434:1--16.  A. Radford L. Metz and S. Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. ArXiv Preprint ArXiv 1511.06434:1--16."},{"volume-title":"Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201916)","author":"Salimans T.","key":"e_1_2_1_15_1","unstructured":"T. Salimans , I. J. Goodfellow , W. Zaremba , V. Cheung , A. Radford , and X. Chen . 2016. Improved techniques for training GANs . In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201916) . 2226--2234. T. Salimans, I. J. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. 2016. Improved techniques for training GANs. In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201916). 2226--2234."},{"volume-title":"Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201915)","author":"Denton E. L.","key":"e_1_2_1_16_1","unstructured":"E. L. Denton , S. Chintala , A. Szlam , and R. Fergus . 2015. Deep generative image models using a Laplacian pyramid of adversarial networks . In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201915) . 1486--1494. E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. 2015. Deep generative image models using a Laplacian pyramid of adversarial networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201915). 1486--1494."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Chou C. J.","key":"e_1_2_1_17_1","unstructured":"C. J. Chou , J. T. Chien , and H. T. Chen . 2017. Self adversarial training for human pose estimation . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917) . C. J. Chou, J. T. Chien, and H. T. Chen. 2017. Self adversarial training for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)."},{"key":"e_1_2_1_18_1","unstructured":"M. Ravanbakhsh E. Sangineto M. Nabi and N. Sebe. 2017. Training adversarial discriminators for cross-channel abnormal event detection in crowds. CoRR abs\/1706.07680 2017.  M. Ravanbakhsh E. Sangineto M. Nabi and N. Sebe. 2017. Training adversarial discriminators for cross-channel abnormal event detection in crowds. CoRR abs\/1706.07680 2017."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"volume-title":"Proceedings of the International Conference on Neural Information Processing Systems. 5769--5779","author":"Gulrajani I.","key":"e_1_2_1_20_1","unstructured":"I. Gulrajani , F. Ahmed , M. Arjovsky , V. Dumoulin , and A. Courville . 2017. Improved training of Wasserstein GANs . In Proceedings of the International Conference on Neural Information Processing Systems. 5769--5779 . I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. 2017. Improved training of Wasserstein GANs. In Proceedings of the International Conference on Neural Information Processing Systems. 5769--5779."},{"key":"e_1_2_1_21_1","unstructured":"J. Deng A. Berg S. Satheesh H. Su A. Khosla and L. FeiFei. 2012. Imagenet large scale visual recognition competition. Retrieved from http:\/\/www.image-net.org\/ challenges\/LSVRC\/2012\/.  J. Deng A. Berg S. Satheesh H. Su A. Khosla and L. FeiFei. 2012. Imagenet large scale visual recognition competition. Retrieved from http:\/\/www.image-net.org\/ challenges\/LSVRC\/2012\/."},{"volume-title":"Proceedings of the International Conference on Machine Learning (PMLR\u201918)","author":"Ilse M.","key":"e_1_2_1_22_1","unstructured":"M. Ilse , J. M. Tomczak , and M. Welling . 2018. Attention-based deep multiple instance learning . In Proceedings of the International Conference on Machine Learning (PMLR\u201918) . M. Ilse, J. M. Tomczak, and M. Welling. 2018. Attention-based deep multiple instance learning. In Proceedings of the International Conference on Machine Learning (PMLR\u201918)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.284"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914)","author":"Andriluka M.","key":"e_1_2_1_24_1","unstructured":"M. Andriluka , L. Pishchulin , P. V. Gehler , and B. Schiele . 2014. 2D human pose estimation: New benchmark and state-of-the-art analysis . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914) . 3686--3693,. M. Andriluka, L. Pishchulin, P. V. Gehler, and B. Schiele. 2014. 2D human pose estimation: New benchmark and state-of-the-art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914). 3686--3693,."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915)","author":"Tompson J.","key":"e_1_2_1_25_1","unstructured":"J. Tompson , R. Goroshin , A. Jain , Y. LeCun , and C. Bregler . 2015. Efficient object localization using convolutional networks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915) . 648--656. J. Tompson, R. Goroshin, A. Jain, Y. LeCun, and C. Bregler. 2015. Efficient object localization using convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915). 648--656."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914)","author":"Toshev A.","key":"e_1_2_1_26_1","unstructured":"A. Toshev and C. Szegedy . 2014. DeepPose: Human pose estimation via deep neural networks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914) . 1653--1660. A. Toshev and C. Szegedy. 2014. DeepPose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914). 1653--1660."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (ICCV\u201918)","author":"G\u00fcler R. A.","key":"e_1_2_1_27_1","unstructured":"R. A. G\u00fcler , N. Neverova , and I. Kokkinos . 2018. DensePose: Dense human pose estimation in the wild . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (ICCV\u201918) . 7297--7306. R. A. G\u00fcler, N. Neverova, and I. Kokkinos. 2018. DensePose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (ICCV\u201918). 7297--7306."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2018.02.003"},{"volume-title":"Proceedings of the European Conference in Computer Vision (ECCV). 282--299","author":"Papandreou G.","key":"e_1_2_1_29_1","unstructured":"G. Papandreou , T. Zhu , L. C. Chen , S. Gidaris , J. Tompson , and K. Murphy . 2018. PersonLab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model . In Proceedings of the European Conference in Computer Vision (ECCV). 282--299 . G. Papandreou, T. Zhu, L. C. Chen, S. Gidaris, J. Tompson, and K. Murphy. 2018. PersonLab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In Proceedings of the European Conference in Computer Vision (ECCV). 282--299."},{"volume-title":"Proceedings of the European Conference in Computer Vision (ECCV\u201916)","author":"Insafutdinov E.","key":"e_1_2_1_30_1","unstructured":"E. Insafutdinov , L. Pishchulin , B. Andres , M. Andriluka , and B. Schiele . 2016. Deepercut: A deeper, stronger, and faster multi-person pose estimation model . In Proceedings of the European Conference in Computer Vision (ECCV\u201916) . 34--50. E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. 2016. Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In Proceedings of the European Conference in Computer Vision (ECCV\u201916). 34--50."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Cao Z.","key":"e_1_2_1_31_1","unstructured":"Z. Cao , T. Simon , S. Wei , and Y. Sheikh . 2017. Realtime multi-person 2D pose estimation using part affinity fields . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917) . Z. Cao, T. Simon, S. Wei, and Y. Sheikh. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(96)00034-3"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision (ICCV\u201909)","author":"Hu Y.","key":"e_1_2_1_33_1","unstructured":"Y. Hu , L. Cao , F. Lv , S. Yan , Y. Gong , and T. S. Huang . 2009. Action detection in complex scenes with spatial and temporal ambiguities . In Proceedings of the IEEE Conference on Computer Vision (ICCV\u201909) . 128--135. Y. Hu, L. Cao, F. Lv, S. Yan, Y. Gong, and T. S. Huang. 2009. Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of the IEEE Conference on Computer Vision (ICCV\u201909). 128--135."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909)","author":"Babenko B.","key":"e_1_2_1_34_1","unstructured":"B. Babenko , M. H. Yang , and S. Belongie . 2009. Visual tracking with online multiple instance learning . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909) . 983--990. B. Babenko, M. H. Yang, and S. Belongie. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909). 983--990."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision (ICCV\u201917)","author":"Ronchi M. R.","key":"e_1_2_1_35_1","unstructured":"M. R. Ronchi and P. Perona . 2017. Benchmarking and error diagnosis in multi-instance pose estimation . In Proceedings of the IEEE Conference on Computer Vision (ICCV\u201917) . 369--378. M. R. Ronchi and P. Perona. 2017. Benchmarking and error diagnosis in multi-instance pose estimation. In Proceedings of the IEEE Conference on Computer Vision (ICCV\u201917). 369--378."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909)","author":"Babenko B.","key":"e_1_2_1_36_1","unstructured":"B. Babenko , M. H. Yang , and S. Belongie . 2009. Visual tracking with online multiple instance learning . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909) . 983--990. B. Babenko, M. H. Yang, and S. Belongie. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909). 983--990."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201912)","author":"Yun K.","key":"e_1_2_1_37_1","unstructured":"K. Yun , J. Honorio , D. Chattopadhyay , T. L. Berg , and D. Samaras . 2012. Two-person interaction detection using body-pose features and multiple instance learning . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201912) . 28--35. K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras. 2012. Two-person interaction detection using body-pose features and multiple instance learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201912). 28--35."},{"volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201915)","author":"Pathak D.","key":"e_1_2_1_38_1","unstructured":"D. Pathak , E. Shelhamer , J. Long , and T. Darrell . 2015. Fully convolutional multi-class multiple instance learning . In Proceedings of the International Conference on Learning Representations (ICLR\u201915) . D. Pathak, E. Shelhamer, J. Long, and T. Darrell. 2015. Fully convolutional multi-class multiple instance learning. In Proceedings of the International Conference on Learning Representations (ICLR\u201915)."},{"key":"e_1_2_1_39_1","first-page":"1","article-title":"Large scale visual recognition through adaptation using joint representation and multiple instance learning","volume":"17","author":"Hoffman J.","year":"2016","unstructured":"J. Hoffman , D. Pathak , E. Tzeng , J. Long , S. Guadarrama , and T. Darrell . 2016 . Large scale visual recognition through adaptation using joint representation and multiple instance learning . J. Mach. Learn. Res. 17 (2016), 1 -- 31 . J. Hoffman, D. Pathak, E. Tzeng, J. Long, S. Guadarrama, and T. Darrell. 2016. Large scale visual recognition through adaptation using joint representation and multiple instance learning. J. Mach. Learn. Res. 17 (2016), 1--31.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2015.07.024"},{"volume-title":"Proceedings of the IEEE International Conference on Data Mining (ICDM\u201915)","author":"Zeng T.","key":"e_1_2_1_41_1","unstructured":"T. Zeng and S. Ji . 2015. Deep convolutional neural networks for multi-instance multi-task learning . In Proceedings of the IEEE International Conference on Data Mining (ICDM\u201915) . 579--588. T. Zeng and S. Ji. 2015. Deep convolutional neural networks for multi-instance multi-task learning. In Proceedings of the IEEE International Conference on Data Mining (ICDM\u201915). 579--588."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2567393"},{"volume-title":"Proceedings of the International Conference on Neural Information Processing Systems (NIPS\u201914)","author":"Goodfellow I. J.","key":"e_1_2_1_43_1","unstructured":"I. J. Goodfellow , J. Pouget-Abadie , M. Mirza , B. Xu , D. Warde-Farley , S. Ozair , A. C. Courville , and Y. Bengio . 2014. Generative adversarial networks . In Proceedings of the International Conference on Neural Information Processing Systems (NIPS\u201914) . 2672--2680. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. 2014. Generative adversarial networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS\u201914). 2672--2680."},{"key":"e_1_2_1_44_1","first-page":"10717","article-title":"BEGAN: Boundary equilibrium generative adversarial networks","volume":"1703","author":"Berthelot D.","year":"2017","unstructured":"D. Berthelot , T. Schumm , and L. Metz . 2017 . BEGAN: Boundary equilibrium generative adversarial networks . Arxiv Preprint Arxiv : 1703 . 10717 , 2017. D. Berthelot, T. Schumm, and L. Metz. 2017. BEGAN: Boundary equilibrium generative adversarial networks. Arxiv Preprint Arxiv:1703.10717, 2017.","journal-title":"Arxiv Preprint Arxiv"},{"key":"e_1_2_1_45_1","unstructured":"M. Mirza and S. Osindero. 2014. Conditional generative adversarial nets. CoRR abs\/1411.1784.  M. Mirza and S. Osindero. 2014. Conditional generative adversarial nets. CoRR abs\/1411.1784."},{"key":"e_1_2_1_46_1","unstructured":"P. Luc C. Couprie S. Chintala and J. Verbeek. 2016. Semantic segmentation using adversarial networks. CoRR abs\/1611.08408 2016.  P. Luc C. Couprie S. Chintala and J. Verbeek. 2016. Semantic segmentation using adversarial networks. CoRR abs\/1611.08408 2016."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Chu X.","key":"e_1_2_1_47_1","unstructured":"X. Chu , W. Yang , W. Ouyang , C. Ma , A. L. Yuille , and X. Wang . 2017. Multi-context attention for human pose estimation . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917) . 5669--5678. X. Chu, W. Yang, W. Ouyang, C. Ma, A. L. Yuille, and X. Wang. 2017. Multi-context attention for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). 5669--5678."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Qi C. R.","key":"e_1_2_1_48_1","unstructured":"C. R. Qi , H. Su , K. Mo , and L. J. Guibas . 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917) . 652--660. C. R. Qi, H. Su, K. Mo, and L. J. Guibas. 2017. Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). 652--660."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"He K.","key":"e_1_2_1_49_1","unstructured":"K. He , X. Zhang , S. Ren , and J. Sun . 2016. Deep residual learning for image recognition . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916) . 770--778. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 770--778."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2017.08.026"},{"volume-title":"Proceedings of the Conference on Machine Learning Research (ACML\u201918)","author":"Yan Y.","key":"e_1_2_1_51_1","unstructured":"Y. Yan , X. Wang , X. Guo , J. Fang , W. Liu , and J. Huang . 2018. Deep multi-instance learning with dynamic pooling . In Proceedings of the Conference on Machine Learning Research (ACML\u201918) . 80, 1--16. Y. Yan, X. Wang, X. Guo, J. Fang, W. Liu, and J. Huang. 2018. Deep multi-instance learning with dynamic pooling. In Proceedings of the Conference on Machine Learning Research (ACML\u201918). 80, 1--16."},{"volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)","author":"Zhou Y.","key":"e_1_2_1_52_1","unstructured":"Y. Zhou , X. Sun , D. Liu , Z. Zha , and W. Zeng . 2017. Adaptive pooling in multi-instance learning for web video annotation . In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917) . 318--327. Y. Zhou, X. Sun, D. Liu, Z. Zha, and W. Zeng. 2017. Adaptive pooling in multi-instance learning for web video annotation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917). 318--327."},{"volume-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201914)","author":"Kingma D. P.","key":"e_1_2_1_53_1","unstructured":"D. P. Kingma and J. Ba . 2014. Adam: A method for stochastic optimization . In Proceedings of the International Conference on Learning Representations (ICLR\u201914) . 1--15. D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR\u201914). 1--15."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911)","author":"Yang Y.","key":"e_1_2_1_54_1","unstructured":"Y. Yang and D. Ramanan . 2011. Articulated pose estimation with flexible mixtures-of-parts . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911) . 1385--1392. Y. Yang and D. Ramanan. 2011. Articulated pose estimation with flexible mixtures-of-parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911). 1385--1392."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909). 1996","author":"Liu J.","year":"2003","unstructured":"J. Liu , J. Luo , and M. Shah . 2009. Recognizing realistic actions from videos \u201cin the Wild \u201d. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909). 1996 -- 2003 . J. Liu, J. Luo, and M. Shah. 2009. Recognizing realistic actions from videos \u201cin the Wild\u201d. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909). 1996--2003."},{"volume-title":"Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201917)","author":"Sabour S.","key":"e_1_2_1_56_1","unstructured":"S. Sabour , N. Frosst , and G. E. Hinton . 2017. Dynamic routing between capsules . In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201917) . 3859--3869. S. Sabour, N. Frosst, and G. E. Hinton. 2017. Dynamic routing between capsules. In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201917). 3859--3869."},{"volume-title":"Proceedings of the International Conference on Pattern Recognition (ICPR\u201916)","author":"Sun M.","key":"e_1_2_1_57_1","unstructured":"M. Sun , T. X. Han , M. C. Liu , and A. K. Rostamabad . 2016. Multiple instance learning convolutional neural networks for object recognition . In Proceedings of the International Conference on Pattern Recognition (ICPR\u201916) . 3270--3275. M. Sun, T. X. Han, M. C. Liu, and A. K. Rostamabad. 2016. Multiple instance learning convolutional neural networks for object recognition. In Proceedings of the International Conference on Pattern Recognition (ICPR\u201916). 3270--3275."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2016.2586194"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2016.2539860"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1524473113"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2015.2406194"},{"key":"e_1_2_1_62_1","doi-asserted-by":"crossref","unstructured":"J. M. Graving D. Chae H. Naik L. Li B. Koger B. R. Costelloe and I. D. Couzin. 2019. DeepPoseKit a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8 (2019) e47994.  J. M. Graving D. Chae H. Naik L. Li B. Koger B. R. Costelloe and I. D. Couzin. 2019. DeepPoseKit a software toolkit for fast and robust animal pose estimation using deep learning. eLife 8 (2019) e47994.","DOI":"10.7554\/eLife.47994"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Chen Y.","key":"e_1_2_1_63_1","unstructured":"Y. Chen , Z. Wang , Y. Peng , Z. Zhang , G. Yu , and J. Sun . 2018. Cascaded pyramid network for multi-person pose estimation . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918) . 7103--7112. Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun. 2018. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). 7103--7112."},{"volume-title":"Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201917)","author":"Ma L.","key":"e_1_2_1_64_1","unstructured":"L. Ma , X. Jia , Q. Sun , B. Schiele , T. Tuytelaars , and L. V. Gool . 2017. Pose guided person image generation . In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201917) . 406--416. L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, and L. V. Gool. 2017. Pose guided person image generation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201917). 406--416."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2901875"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.5080207"},{"volume-title":"Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201917)","author":"Nguyen T. D.","key":"e_1_2_1_67_1","unstructured":"T. D. Nguyen , T. Le , H. Vu , and D. Phung . 2017. Dual discriminator generative adversarial nets . In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201917) . 2670--2680. T. D. Nguyen, T. Le, H. Vu, and D. Phung. 2017. Dual discriminator generative adversarial nets. In Proceedings of the International Conference on Advances in Neural Information Processing Systems (NIPS\u201917). 2670--2680."},{"volume-title":"Proceedings of the International Conference on Learning Representations (ICLR'18)","author":"Hoang Q.","key":"e_1_2_1_68_1","unstructured":"Q. Hoang , T. D. Nguyen , T. Le , and D. Phung . 2018. MGAN: Training generative adversarial nets with multiple generators . In Proceedings of the International Conference on Learning Representations (ICLR'18) . Q. Hoang, T. D. Nguyen, T. Le, and D. Phung. 2018. MGAN: Training generative adversarial nets with multiple generators. In Proceedings of the International Conference on Learning Representations (ICLR'18)."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Chavdarova T.","key":"e_1_2_1_69_1","unstructured":"T. Chavdarova and F. Fleuret . 2018. Sgan: An alternative training of generative adversarial networks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918) . 9407--9415. T. Chavdarova and F. Fleuret. 2018. Sgan: An alternative training of generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). 9407--9415."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3355612","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3355612","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:13:29Z","timestamp":1750202009000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3355612"}},"subtitle":["Adversarial Multi-instance Learning for Human Pose Estimation"],"short-title":[],"issued":{"date-parts":[[2020,1,31]]},"references-count":69,"journal-issue":{"issue":"1s","published-print":{"date-parts":[[2020,1,31]]}},"alternative-id":["10.1145\/3355612"],"URL":"https:\/\/doi.org\/10.1145\/3355612","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2020,1,31]]},"assertion":[{"value":"2019-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-04-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}