{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T17:29:12Z","timestamp":1770917352120,"version":"3.50.1"},"reference-count":43,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2024,10,1]]},"DOI":"10.1587\/transinf.2024edp7048","type":"journal-article","created":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T22:13:05Z","timestamp":1727734385000},"page":"1332-1341","source":"Crossref","is-referenced-by-count":1,"title":["Multi-Scale Contrastive Learning for Human Pose Estimation"],"prefix":"10.1587","volume":"E107.D","author":[{"given":"Wenxia","family":"BAO","sequence":"first","affiliation":[{"name":"School of Electronics and Information Engineering, Anhui University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"An","family":"LIN","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Anhui University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hua","family":"HUANG","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Anhui University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xianjun","family":"YANG","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Anhui University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hemu","family":"CHEN","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Anhui University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"crossref","unstructured":"[1] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll\u00e1r, and C.L. Zitnick, \u201cMicrosoft coco: Common objects in context,\u201d Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, Sept. 6-12, 2014, Proceedings, Part V 13, pp.740-755, 2014. 10.1007\/978-3-319-10602-1_48","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"2","unstructured":"[2] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, \u201cA simple framework for contrastive learning of visual representations,\u201d International conference on machine learning, pp.1597-1607, 2020."},{"key":"3","doi-asserted-by":"crossref","unstructured":"[3] J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik, \u201cHuman pose estimation with iterative error feedback,\u201d 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.4733-4742, 2016. 10.1109\/CVPR.2016.512","DOI":"10.1109\/CVPR.2016.512"},{"key":"4","unstructured":"[4] J.B. Grill, F. Strub, F. Altch\u00e9, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, and M. Gheshlaghi Azar, \u201cBootstrap your own latent-a new approach to self-supervised learning,\u201d Advances in neural information processing systems, pp.21271-21284, 2020."},{"key":"5","doi-asserted-by":"publisher","unstructured":"[5] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, and X. Wang, \u201cDeep high-resolution representation learning for visual recognition,\u201d IEEE Trans. Pattern Anal. Mach. Intell., vol.43, no.10, pp.3349-3364, 2020. 10.1109\/TPAMI.2020.2983686","DOI":"10.1109\/TPAMI.2020.2983686"},{"key":"6","doi-asserted-by":"crossref","unstructured":"[6] K. Sun, B. Xiao, D. Liu, and J. Wang, \u201cDeep high-resolution representation learning for human pose estimation,\u201d Proc. IEEE\/CVF conference on computer vision and pattern recognition, pp.5686-5696, 2019. 10.1109\/CVPR.2019.00584","DOI":"10.1109\/CVPR.2019.00584"},{"key":"7","doi-asserted-by":"crossref","unstructured":"[7] H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, \u201cRmpe: Regional multi-person pose estimation,\u201d Proc. IEEE international conference on computer vision, pp.2353-2362, 2017. 10.1109\/ICCV.2017.256","DOI":"10.1109\/ICCV.2017.256"},{"key":"8","doi-asserted-by":"crossref","unstructured":"[8] M. Fieraru, A. Khoreva, L. Pishchulin, and B. Schiele, \u201cLearning to refine human pose estimation,\u201d Proc. IEEE conference on computer vision and pattern recognition workshops, pp.318-31809, 2018. 10.1109\/CVPRW.2018.00058","DOI":"10.1109\/CVPRW.2018.00058"},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, \u201cConvolutional pose machines,\u201d Proc. IEEE conference on Computer Vision and Pattern Recognition, pp.4724-4732, 2016. 10.1109\/cvpr.2016.511","DOI":"10.1109\/CVPR.2016.511"},{"key":"10","doi-asserted-by":"publisher","unstructured":"[10] S.K. Yadav, A. Singh, A. Gupta, and J.L. Raheja, \u201cReal-time Yoga recognition using deep learning,\u201d Neural Computing and Applications, vol.31, no.12, pp.9349-9361, 2019, 2019. 10.1007\/s00521-019-04232-7","DOI":"10.1007\/s00521-019-04232-7"},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] A. Newell, K. Yang, and J. Deng, \u201cStacked hourglass networks for human pose estimation,\u201d Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, Oct. 11-14, 2016, Proceedings, Part VIII 14 Part VIII, vol.9912, pp.483-499, 2016. 10.1007\/978-3-319-46484-8_29","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"12","doi-asserted-by":"crossref","unstructured":"[12] S. Yang, Z. Quan, M. Nie, and W. Yang, \u201cTranspose: Keypoint localization via transformer,\u201d Proc. IEEE\/CVF International Conference on Computer Vision 11802-12, 2021. 10.1109\/iccv48922.2021.01159","DOI":"10.1109\/ICCV48922.2021.01159"},{"key":"13","doi-asserted-by":"crossref","unstructured":"[13] Y. Li, S. Zhang, Z. Wang, S. Yang, W. Yang, S.-T. Xia, and E. Zhou, \u201cTokenpose: Learning keypoint tokens for human pose estimation,\u201d Proc. IEEE\/CVF International conference on computer vision, pp.11293-11302, 2021. 10.1109\/iccv48922.2021.01112","DOI":"10.1109\/ICCV48922.2021.01112"},{"key":"14","doi-asserted-by":"crossref","unstructured":"[14] K. Li, S. Wang, X. Zhang, Y. Xu, W. Xu, and Z. Tu, \u201cPose recognition with cascade transformers,\u201d Proc. IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp.1944-1953, 2021. 10.1109\/CVPR46437.2021.00198","DOI":"10.1109\/CVPR46437.2021.00198"},{"key":"15","doi-asserted-by":"crossref","unstructured":"[15] Z. Geng, C. Wang, Y. Wei, Z. Liu, H. Li, and H. Hu, \u201cHuman pose as compositional tokens,\u201d arXiv preprint arXiv:230311638, 2023.","DOI":"10.1109\/CVPR52729.2023.00071"},{"key":"16","unstructured":"[16] Y. Xu, J. Zhang, Q. Zhang, and D. Tao, \u201cVitpose: Simple vision transformer baselines for human pose estimation,\u201d arXiv preprint arXiv:220412484, 2022."},{"key":"17","unstructured":"[17] Y. Yuan, R. Fu, L. Huang, W. Lin, C. Zhang, X. Chen, and J. Wang, \u201cHrformer: High-resolution transformer for dense prediction,\u201d arXiv preprint arXiv:211009408, 2021."},{"key":"18","unstructured":"[18] J. Kim, H. Lee, J. Lim, J. Na, N. Kwak, and J.Y. Choi, \u201cPose-MUM: Reinforcing key points relationship for semi-supervised human pose estimation,\u201d arXiv preprint arXiv:220307837, 2022."},{"key":"19","doi-asserted-by":"crossref","unstructured":"[19] R. Xie, C. Wang, W. Zeng, and Y. Wang, \u201cAn empirical study of the collapsing problem in semi-supervised 2d human pose estimation,\u201d Proc. IEEE\/CVF International Conference on Computer Vision, pp.11220-11229, 2021. 10.1109\/ICCV48922.2021.01105","DOI":"10.1109\/ICCV48922.2021.01105"},{"key":"20","unstructured":"[20] X. Chen, H. Fan, R. Girshick, and K. He, \u201cImproved baselines with momentum contrastive learning,\u201d arXiv preprint arXiv:200304297, 2020."},{"key":"21","doi-asserted-by":"crossref","unstructured":"[21] X. Chen and K. He, \u201cExploring simple siamese representation learning,\u201d Proc. IEEE\/CVF conference on computer vision and pattern recognition 15750-8, 2021. 10.1109\/cvpr46437.2021.01549","DOI":"10.1109\/CVPR46437.2021.01549"},{"key":"22","doi-asserted-by":"crossref","unstructured":"[22] L. Ke, M.-C. Chang, H. Qi, and S. Lyu, \u201cMulti-scale structure-aware network for human pose estimation,\u201d Proc. european conference on computer vision (ECCV), vol.11206, pp.731-746, 2018. 10.1007\/978-3-030-01216-8_44","DOI":"10.1007\/978-3-030-01216-8_44"},{"key":"23","doi-asserted-by":"crossref","unstructured":"[23] H.Y. Zhou, C. Lu, C. Chen, S. Yang, and Y. Yu, \u201cPCRLv2: A unified visual information preservation framework for self-supervised pre-training in medical image analysis,\u201d arXiv preprint arXiv:230100772, 2023.","DOI":"10.1109\/TPAMI.2023.3234002"},{"key":"24","doi-asserted-by":"crossref","unstructured":"[24] Z. Zhao, J. Hu, Z. Zeng, X. Yang, P. Qian, B. Veeravalli, and C. Guan, \u201cMMGL: Multi-Scale Multi-View Global-Local Contrastive Learning for Semi-Supervised Cardiac Image Segmentation,\u201d 2022 IEEE International Conference on Image Processing (ICIP), pp.401-405, 2022. 10.1109\/ICIP46576.2022.9897591","DOI":"10.1109\/ICIP46576.2022.9897591"},{"key":"25","doi-asserted-by":"crossref","unstructured":"[25] C.-Y. Hsieh, C.-J. Chang, F.-E. Yang, and Y.-C.F. Wang, \u201cSelf-Supervised Pyramid Representation Learning for Multi-Label Visual Analysis and Beyond,\u201d Proc. IEEE\/CVF Winter Conference on Applications of Computer Vision, pp.2695-2704, 2023. 10.1109\/WACV56688.2023.00272","DOI":"10.1109\/WACV56688.2023.00272"},{"key":"26","doi-asserted-by":"crossref","unstructured":"[26] E. Xie, J. Ding, W. Wang, X. Zhan, H. Xu, P. Sun, Z. Li, and P. Luo, \u201cDetco: Unsupervised contrastive learning for object detection,\u201d Proc. IEEE\/CVF International Conference on Computer Vision, pp.8372-8381, 2021. 10.1109\/ICCV48922.2021.00828","DOI":"10.1109\/ICCV48922.2021.00828"},{"key":"27","doi-asserted-by":"crossref","unstructured":"[27] A. Ziegler and Y.M. Asano, \u201cSelf-supervised learning of object parts for semantic segmentation,\u201d Proc. IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp.14482-14491, 2022. 10.1109\/CVPR52688.2022.01410","DOI":"10.1109\/CVPR52688.2022.01410"},{"key":"28","doi-asserted-by":"publisher","unstructured":"[28] S. Zhang, W. Wang, H. Li, and S. Zhang, \u201cBounding convolutional network for refining object locations,\u201d Neural Computing and Applications, vol.35, no.26, pp.19297-19313, 2023. 10.1007\/s00521-023-08782-9","DOI":"10.1007\/s00521-023-08782-9"},{"key":"29","doi-asserted-by":"crossref","unstructured":"[29] K. He, X. Zhang, S. Ren, and J. Sun, \u201cDeep residual learning for image recognition,\u201d Proc. IEEE conference on computer vision and pattern recognition, pp.770-778, 2016. 10.1109\/CVPR.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"key":"30","doi-asserted-by":"crossref","unstructured":"[30] T.-Y. Lin, P. Doll\u00e1r, R. Girshick, K. He, B. Hariharan, and S.Belongie, \u201cFeature pyramid networks for object detection,\u201d Proc. IEEE conference on computer vision and pattern recognition, pp.936-944, 2017. 10.1109\/CVPR.2017.106","DOI":"10.1109\/CVPR.2017.106"},{"key":"31","unstructured":"[31] P. Bachman, R.D. Hjelm, and W. Buchwalter, \u201cLearning representations by maximizing mutual information across views,\u201d Advances in neural information processing systems, pp.15535-15545, 2019."},{"key":"32","unstructured":"[32] G. Hinton, O. Vinyals, and J. Dean, \u201cDistilling the knowledge in a neural network,\u201d arXiv preprint arXiv:150302531, 2015."},{"key":"33","doi-asserted-by":"crossref","unstructured":"[33] Z. Ke, D. Wang, Q. Yan, J. Ren, and R.W. Lau, \u201cDual student: Breaking the limits of the teacher in semi-supervised learning,\u201d pp.6727-6735, 2019. 10.1109\/ICCV.2019.00683","DOI":"10.1109\/ICCV.2019.00683"},{"key":"34","doi-asserted-by":"crossref","unstructured":"[34] K. He, G. Gkioxari, P. Doll\u00e1r, and R. Girshick, \u201cMask r-cnn,\u201d Proc. IEEE international conference on computer vision, pp.2980-2988, 2017. 10.1109\/ICCV.2017.322","DOI":"10.1109\/ICCV.2017.322"},{"key":"35","doi-asserted-by":"crossref","unstructured":"[35] X. Wang, R. Zhang, C. Shen, T. Kong, and L. Li, \u201cDense contrastive learning for self-supervised visual pre-training,\u201d Proc. IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp.3023-3032, 2021. 10.1109\/CVPR46437.2021.00304","DOI":"10.1109\/CVPR46437.2021.00304"},{"key":"36","doi-asserted-by":"crossref","unstructured":"[36] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, \u201cImagenet: A large-scale hierarchical image database,\u201d 2009 IEEE conference on computer vision and pattern recognition, pp.248-255, 2009. 10.1109\/CVPR.2009.5206848","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"37","doi-asserted-by":"crossref","unstructured":"[37] M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, \u201c2d human pose estimation: New benchmark and state of the art analysis,\u201d Proc. IEEE Conference on computer Vision and Pattern Recognition, pp.3686-3693, 2014. 10.1109\/CVPR.2014.471","DOI":"10.1109\/CVPR.2014.471"},{"key":"38","unstructured":"[38] J. Wu, H. Zheng, B. Zhao, Y. Li, B. Yan, R. Liang, W. Wang, S. Zhou, G. Lin, and Y. Fu, \u201cAI challenger: A large-scale dataset for going deeper in image understanding,\u201d arXiv preprint arXiv:171106475, 2017."},{"key":"39","doi-asserted-by":"crossref","unstructured":"[39] B. Xiao, H. Wu, and Y. Wei, \u201cSimple baselines for human pose estimation and tracking,\u201d Proc. European conference on computer vision (ECCV), vol.11210, pp.472-487, 2018. 10.1007\/978-3-030-01231-1_29","DOI":"10.1007\/978-3-030-01231-1_29"},{"key":"40","unstructured":"[40] D.P. Kingma and J. Ba, \u201cAdam: A method for stochastic optimization,\u201d arXiv preprint arXiv:14126980, 2014."},{"key":"41","unstructured":"[41] D.H. Lee, \u201cPseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,\u201d Workshop on challenges in representation learning, ICML2 3(2), 896, 2013."},{"key":"42","doi-asserted-by":"crossref","unstructured":"[42] I. Radosavovic, P. Doll\u00e1r, R. Girshick, G. Gkioxari, and K. He, \u201cData distillation: Towards omni-supervised learning,\u201d Proc. IEEE conference on computer vision and pattern recognition, pp.4119-4128, 2018. 10.1109\/CVPR.2018.00433","DOI":"10.1109\/CVPR.2018.00433"},{"key":"43","doi-asserted-by":"crossref","unstructured":"[43] R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, \u201cGrad-cam: Visual explanations from deep networks via gradient-based localization,\u201d Proc. IEEE international conference on computer vision, pp.618-626, 2017. 10.1109\/ICCV.2017.74","DOI":"10.1109\/ICCV.2017.74"}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E107.D\/10\/E107.D_2024EDP7048\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,5]],"date-time":"2024-10-05T03:46:04Z","timestamp":1728099964000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E107.D\/10\/E107.D_2024EDP7048\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,1]]},"references-count":43,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2024edp7048","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"value":"0916-8532","type":"print"},{"value":"1745-1361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,1]]},"article-number":"2024EDP7048"}}