{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T22:43:33Z","timestamp":1776811413364,"version":"3.51.2"},"reference-count":28,"publisher":"European Society of Computational Methods in Sciences and Engineering","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["JCM"],"published-print":{"date-parts":[[2021,12,7]]},"abstract":"<jats:p>Human pose estimate can be used in action recognition, video surveillance and other fields, which has received a lot of attentions. Since the flexibility of human joints and environmental factors greatly influence pose estimation accuracy, related research is confronted with many challenges. In this paper, we incorporate the pyramid convolution and attention mechanism into the residual block, and introduce a hybrid structure model which synthetically applies the local and global information of the image for the analysis of keypoints detection. In addition, our improved structure model adopts grouped convolution, and the attention module used is lightweight, which will reduce the computational cost of the network. Simulation experiments based on the MS COCO human body keypoints detection data set show that, compared with the Simple Baseline model, our model is similar in parameters and GFLOPs (giga floating-point operations per second), but the performance is better on the detection of accuracy under the multi-person scenes.<\/jats:p>","DOI":"10.3233\/jcm-215210","type":"journal-article","created":{"date-parts":[[2021,8,13]],"date-time":"2021-08-13T13:48:00Z","timestamp":1628862480000},"page":"1913-1923","source":"Crossref","is-referenced-by-count":2,"title":["A combined local and global structure module for human pose estimation"],"prefix":"10.66113","volume":"21","author":[{"given":"Zhihui","family":"Yang","sequence":"first","affiliation":[{"name":"Institute of Image Processing and Pattern Recognition, North China University of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiangyu","family":"Tang","sequence":"additional","affiliation":[{"name":"Institute of Image Processing and Pattern Recognition, North China University of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lijuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Image Processing and Pattern Recognition, North China University of Technology, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiling","family":"Yang","sequence":"additional","affiliation":[{"name":"905th Hospital of PLA Navy, Shanghai, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"55691","reference":[{"issue":"6","key":"10.3233\/JCM-215210_ref1","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagnet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Communications of the ACM"},{"key":"10.3233\/JCM-215210_ref2","doi-asserted-by":"crossref","unstructured":"A. Toshev and C. Szegedy, Deeppose: Human pose estimation via deep neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp.\u00a01653\u20131660.","DOI":"10.1109\/CVPR.2014.214"},{"issue":"12","key":"10.3233\/JCM-215210_ref3","doi-asserted-by":"crossref","first-page":"3007","DOI":"10.1109\/TPAMI.2017.2771306","article-title":"Skeleton-based action recognition using spatio-temporal lstm network with trust gates","volume":"40","author":"Liu","year":"2018","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"2\u20134","key":"10.3233\/JCM-215210_ref4","doi-asserted-by":"crossref","first-page":"410","DOI":"10.1007\/s11263-017-1026-6","article-title":"Joint estimation of human pose a conversational group from social scenes","volume":"126","author":"Varadarajan","year":"2018","journal-title":"International Journal of Computer Vision"},{"issue":"3","key":"10.3233\/JCM-215210_ref5","doi-asserted-by":"crossref","first-page":"387","DOI":"10.3233\/JCM-150551","article-title":"Fast pedestrian detection based on feature of local model","volume":"15","author":"Gu","year":"2015","journal-title":"Journal of Computational Methods in Sciences and Engineering"},{"key":"10.3233\/JCM-215210_ref6","doi-asserted-by":"crossref","unstructured":"A. Cherian, J. Mairal and K. Alahari, Mixing body-part sequences for human pose estimation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp.\u00a02353\u20132360.","DOI":"10.1109\/CVPR.2014.302"},{"key":"10.3233\/JCM-215210_ref8","doi-asserted-by":"crossref","unstructured":"H. Fang, S. Xie, Y. Tai and C. Lu, RMPE: Regional Multi-person Pose Estimation, 2017 IEEE International Conference on Computer Vision, 2017, pp.\u00a02353\u20132362.","DOI":"10.1109\/ICCV.2017.256"},{"key":"10.3233\/JCM-215210_ref10","unstructured":"X. Bin, W. Haiping and W. Yichen, Simple baselines for human pose estimation and tracking, European Conference on Computer Vision, 2018."},{"key":"10.3233\/JCM-215210_ref11","doi-asserted-by":"crossref","unstructured":"Y. Chen, Z. Wang and Y. Peng, Cascaded pyramid network for multi-person pose estimation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp.\u00a07103\u20137112.","DOI":"10.1109\/CVPR.2018.00742"},{"key":"10.3233\/JCM-215210_ref12","doi-asserted-by":"crossref","unstructured":"K. Sun, B. Xiao and D. Liu, Deep High-Resolution Representation Learning for Human Pose Estimation, Conference on Computer Vision and Pattern Recognition, 2019.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"10.3233\/JCM-215210_ref13","doi-asserted-by":"crossref","unstructured":"Z. Cao and T. Simon, Realtime multi-person 2d pose estimation using part affinity fields, Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp.\u00a07291\u20137299.","DOI":"10.1109\/CVPR.2017.143"},{"key":"10.3233\/JCM-215210_ref14","doi-asserted-by":"crossref","unstructured":"G. Papandreou, T. Zhu and L.C Chen, Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model, Proceedings of the European Conference on Computer Vision, 2018, pp.\u00a0269\u2013286.","DOI":"10.1007\/978-3-030-01264-9_17"},{"key":"10.3233\/JCM-215210_ref15","unstructured":"A. Newell, Z. Huang and J. Deng, Associative embedding: End-to-end learning for joint detection and grouping, Advances in Neural Information Processing Systems, 2017, pp.\u00a02277\u20132287."},{"key":"10.3233\/JCM-215210_ref16","doi-asserted-by":"crossref","unstructured":"G. Pavlakos, X. Zhou and K.G. Derpanis, Coarse-to-fine volumetric prediction for single-image 3D human pose, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.\u00a07025\u20137034.","DOI":"10.1109\/CVPR.2017.139"},{"key":"10.3233\/JCM-215210_ref17","doi-asserted-by":"crossref","unstructured":"D. Pavllo, C. Feichtenhofer and D. Grangier, 3D human pose estimation in video with temporal convolutions and semi-supervised training, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.\u00a07753\u20137762.","DOI":"10.1109\/CVPR.2019.00794"},{"key":"10.3233\/JCM-215210_ref18","doi-asserted-by":"crossref","unstructured":"B. Wandt and B. Rosenhahn, Repnet: Weakly supervised training of an adversarial reprojection network for 3d human pose estimation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp.\u00a07782\u20137791.","DOI":"10.1109\/CVPR.2019.00797"},{"issue":"2","key":"10.3233\/JCM-215210_ref19","first-page":"82:1","article-title":"XNect: Real-time multi-person 3D motion capture with a single RGB camera","volume":"39","author":"Mehta","year":"2020","journal-title":"ACM Transactions on Graphics"},{"key":"10.3233\/JCM-215210_ref20","unstructured":"N. Alejandro, Y. Kaiyu and D. Jia, Stacked hourglass networks for human pose estimation, European Conference on Computer Vision, Springer International Publishing, 2016."},{"key":"10.3233\/JCM-215210_ref21","unstructured":"Z. Su, M. Ye and G. Zhang, Cascade feature aggregation for human pose estimation, 2019."},{"key":"10.3233\/JCM-215210_ref22","doi-asserted-by":"crossref","unstructured":"K. He, X. Zhang, S. Ren and J. Sun, Deep Residual Learning for Image Recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.\u00a0770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"10.3233\/JCM-215210_ref23","doi-asserted-by":"crossref","unstructured":"B. Cheng, B. Xiao and J. Wang, HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation, Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp.\u00a05386\u20135395.","DOI":"10.1109\/CVPR42600.2020.00543"},{"key":"10.3233\/JCM-215210_ref24","unstructured":"I.C Duta, L. Liu and F. Zhu, Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition, 2020."},{"key":"10.3233\/JCM-215210_ref25","unstructured":"S. Ren, K. He and R. Girshick, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, 2015, pp.\u00a091\u201399."},{"key":"10.3233\/JCM-215210_ref26","doi-asserted-by":"crossref","unstructured":"J. Hu, L. Shen and G. Sun, Squeeze-and-excitation networks, Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp.\u00a07132\u20137141.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"10.3233\/JCM-215210_ref27","doi-asserted-by":"crossref","unstructured":"T. Lin, M. Maire and S.J. Belongie, Microsoft COCO: common objects in context, European Conference on Computer Vision, 2014, pp.\u00a0740\u2013755.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"10.3233\/JCM-215210_ref28","doi-asserted-by":"crossref","unstructured":"Y. Cao, J. Xu and S. Lin, Gcnet: Non-local networks meet squeeze-excitation networks and beyond, Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"10.3233\/JCM-215210_ref29","doi-asserted-by":"crossref","unstructured":"X. Wang, R. Girshick and A. Gupta, Non-local neural networks, IEEE Conference on Computer Vision and Pattern Recognition, 2018.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"10.3233\/JCM-215210_ref30","doi-asserted-by":"crossref","unstructured":"X. Chu, W. Yang and W. Ouyang, Multi-context attention for human pose estimation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.\u00a01831\u20131840.","DOI":"10.1109\/CVPR.2017.601"}],"container-title":["Journal of Computational Methods in Sciences and Engineering"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/JCM-215210","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T22:06:13Z","timestamp":1776809173000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/JCM-215210"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,7]]},"references-count":28,"journal-issue":{"issue":"6"},"URL":"https:\/\/doi.org\/10.3233\/jcm-215210","relation":{},"ISSN":["1472-7978","1875-8983"],"issn-type":[{"value":"1472-7978","type":"print"},{"value":"1875-8983","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,7]]}}}