{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:28:44Z","timestamp":1760149724200,"version":"build-2065373602"},"reference-count":45,"publisher":"MDPI AG","issue":"17","license":[{"start":{"date-parts":[[2023,9,3]],"date-time":"2023-09-03T00:00:00Z","timestamp":1693699200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Fundamental Research Funds for the Central Universities","award":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"],"award-info":[{"award-number":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"]}]},{"name":"Humanity and Social Science Youth Foundation of Ministry of Education of China","award":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"],"award-info":[{"award-number":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"]}]},{"name":"Hubei Province Social Science Fund General Project (subsequent funding)","award":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"],"award-info":[{"award-number":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"]}]},{"name":"Hubei Natural Science Foundation Youth Project","award":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"],"award-info":[{"award-number":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"],"award-info":[{"award-number":["2022WKYXZX019","22YJC890005","HBSK2022YB562","2023AFB359","G1323522067"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Human pose estimation is the basis of many downstream tasks, such as motor intervention, behavior understanding, and human\u2013computer interaction. The existing human pose estimation methods rely too much on the similarity of keypoints at the image feature level, which is vulnerable to three problems: object occlusion, keypoints ghost, and neighbor pose interference. We propose a dual-space-driven topology model for the human pose estimation task. Firstly, the model extracts relatively accurate keypoints features through a Transformer-based feature extraction method. Then, the correlation of keypoints in the physical space is introduced to alleviate the error localization problem caused by excessive dependence on the feature-level representation of the model. Finally, through the graph convolutional neural network, the spatial correlation of keypoints and the feature correlation are effectively fused to obtain more accurate human pose estimation results. The experimental results on real datasets also further verify the effectiveness of our proposed model.<\/jats:p>","DOI":"10.3390\/s23177626","type":"journal-article","created":{"date-parts":[[2023,9,4]],"date-time":"2023-09-04T02:59:55Z","timestamp":1693796395000},"page":"7626","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["DSPose: Dual-Space-Driven Keypoint Topology Modeling for Human Pose Estimation"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-0867-3397","authenticated-orcid":false,"given":"Anran","family":"Zhao","sequence":"first","affiliation":[{"name":"School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-9065-6493","authenticated-orcid":false,"given":"Jingli","family":"Li","sequence":"additional","affiliation":[{"name":"School of Physical Education, Huazhong University of Science and Technology, Wuhan 430074, China"},{"name":"Sport and Health Initiative, Optical Valley Laboratory, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongtao","family":"Zeng","sequence":"additional","affiliation":[{"name":"School of Physical Education, Huazhong University of Science and Technology, Wuhan 430074, China"},{"name":"Sport and Health Initiative, Optical Valley Laboratory, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongren","family":"Cheng","sequence":"additional","affiliation":[{"name":"Sports Big-Data Research Center, Wuhan Sports University, Wuhan 430079, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liangshan","family":"Dong","sequence":"additional","affiliation":[{"name":"School of Physical Education, China University of Geosciences, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011, January 20\u201325). Real-time human pose recognition in parts from single depth images. Proceedings of the CVPR 2011, Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995316"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"7107","DOI":"10.1109\/TII.2022.3143605","article-title":"Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human\u2013computer interaction","volume":"18","author":"Liu","year":"2022","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"103740","DOI":"10.1016\/j.infrared.2021.103740","article-title":"Precise head pose estimation on HPD5A database for attention recognition based on convolutional neural network in human-computer interaction","volume":"116","author":"Liu","year":"2021","journal-title":"Infrared Phys. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Xiao, B., Wu, H., and Wei, Y. (2018, January 8\u201314). Simple baselines for human pose estimation and tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01231-1_29"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., and Sun, J. (2018, January 18\u201322). Cascaded pyramid network for multi-person pose estimation. Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00742"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Sun, K., Xiao, B., Liu, D., and Wang, J. (2019, January 15\u201320). Deep high-resolution representation learning for human pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"ref_7","unstructured":"Yuan, Y., Rao, F., Lang, H., Lin, W., Zhang, C., Chen, X., and Wang, J. (2021). Hrformer: High-resolution transformer for dense prediction. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Li, Y., Zhang, S., Wang, Z., Yang, S., Yang, W., Xia, S.T., and Zhou, E. (2021, January 11\u201317). Tokenpose: Learning keypoint tokens for human pose estimation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01112"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Yang, S., Quan, Z., Nie, M., and Yang, W. (2021, January 11\u201317). Transpose: Keypoint localization via transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01159"},{"key":"ref_10","first-page":"38571","article-title":"Vitpose: Simple vision transformer baselines for human pose estimation","volume":"35","author":"Xu","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_12","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_13","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 11\u201317). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhu, L., Wang, X., Ke, Z., Zhang, W., and Lau, R.W. (2023, January 17\u201324). BiFormer: Vision Transformer with Bi-Level Routing Attention. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00995"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Xu, L., Ouyang, W., Bennamoun, M., Boussaid, F., and Xu, D. (2022, January 18\u201324). Multi-class token transformer for weakly supervised semantic segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00427"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","article-title":"The graph neural network model","volume":"20","author":"Scarselli","year":"2008","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_20","unstructured":"Kipf, T.N., and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv."},{"key":"ref_21","unstructured":"Hamilton, W., Ying, Z., and Leskovec, J. (2017, January 4\u20139). Inductive representation learning on large graphs. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_22","unstructured":"Veli\u010dkovi\u0107, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"3961","DOI":"10.1109\/TNNLS.2021.3055147","article-title":"Learning knowledge graph embedding with heterogeneous relation attention networks","volume":"33","author":"Li","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"4361","DOI":"10.1109\/TII.2021.3128240","article-title":"EDMF: Efficient deep matrix factorization with review feature learning for industrial recommender system","volume":"18","author":"Liu","year":"2021","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1016\/j.neucom.2021.03.122","article-title":"CARM: Confidence-aware recommender model via review representation learning and historical rating behavior in the online platforms","volume":"455","author":"Li","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2335","DOI":"10.1109\/TKDE.2020.3005952","article-title":"Multi-scale dynamic convolutional network for knowledge graph embedding","volume":"34","author":"Zhang","year":"2020","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"2449","DOI":"10.1109\/TMM.2021.3081873","article-title":"MFDNet: Collaborative poses perception and matrix Fisher distribution for head pose estimation","volume":"24","author":"Liu","year":"2021","journal-title":"IEEE Trans. Multimed."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ling, H., Gao, J., Kar, A., Chen, W., and Fidler, S. (2019, January 15\u201320). Fast interactive object annotation with curve-gcn. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00540"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chen, Z.M., Wei, X.S., Wang, P., and Guo, Y. (2019, January 15\u201320). Multi-label image recognition with graph convolutional networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00532"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., and Jiang, Y.G. (2018, January 8\u201314). Pixel2mesh: Generating 3d mesh models from single rgb images. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01252-6_4"},{"key":"ref_31","first-page":"8291","article-title":"Vision gnn: An image is worth graph of nodes","volume":"35","author":"Han","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_32","first-page":"101605","article-title":"GCANet: Geometry Cues-aware Facial Expression Recognition based on Graph Convolutional Networks","volume":"35","author":"Wang","year":"2023","journal-title":"J. King Saud-Univ.-Comput. Inf. Sci."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Tang, W., and Wu, Y. (2019, January 15\u201320). Does learning specific features for related parts help human pose estimation?. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00120"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Andriluka, M., Pishchulin, L., Gehler, P., and Schiele, B. (2014, January 23\u201328). 2d human pose estimation: New benchmark and state of the art analysis. Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.471"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhang, F., Zhu, X., Dai, H., Ye, M., and Zhu, C. (2020, January 14\u201319). Distribution-aware coordinate representation for human pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00712"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., and Murphy, K. (2017, January 21\u201326). Towards accurate multi-person pose estimation in the wild. Proceedings of the IEEE Conference on Computer vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.395"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Sun, X., Xiao, B., Wei, F., Liang, S., and Wei, Y. (2018, January 8\u201314). Integral human pose regression. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01231-1_33"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Fang, H.S., Xie, S., Tai, Y.W., and Lu, C. (2017, January 22\u201329). Rmpe: Regional multi-person pose estimation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.256"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Li, K., Wang, S., Zhang, X., Xu, Y., Xu, W., and Tu, Z. (2021, January 20\u201325). Pose recognition with cascade transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00198"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Li, K., Wang, Y., Zhang, J., Gao, P., Song, G., Liu, Y., Li, H., and Qiao, Y. (2023). Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell., 1\u201318.","DOI":"10.1109\/TPAMI.2023.3282631"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Ye, S., Zhang, Y., Hu, J., Cao, L., Zhang, S., Shen, L., Wang, J., Ding, S., and Ji, R. (2023, January 17\u201324). DistilPose: Tokenized Pose Regression with Heatmap Distillation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00215"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhao, S., Liu, K., Huang, Y., Bao, Q., Zeng, D., and Liu, W. (2022, January 27\u201328). DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation. Proceedings of the CAAI International Conference on Artificial Intelligence, Beijing, China.","DOI":"10.1007\/978-3-031-20500-2_46"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Mao, W., Ge, Y., Shen, C., Tian, Z., Wang, X., and Wang, Z. (2021). Tfpose: Direct human pose estimation with transformers. arXiv.","DOI":"10.1007\/978-3-031-20068-7_5"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Li, Y., Yang, S., Liu, P., Zhang, S., Wang, Y., Wang, Z., Yang, W., and Xia, S.T. (2022, January 23\u201327). Simcc: A simple coordinate classification perspective for human pose estimation. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20068-7_6"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/17\/7626\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:45:37Z","timestamp":1760129137000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/17\/7626"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,3]]},"references-count":45,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2023,9]]}},"alternative-id":["s23177626"],"URL":"https:\/\/doi.org\/10.3390\/s23177626","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,9,3]]}}}