{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:39:19Z","timestamp":1760150359424,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2023,11,10]],"date-time":"2023-11-10T00:00:00Z","timestamp":1699574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In the field of computer vision, hand pose estimation (HPE) has attracted significant attention from researchers, especially in the fields of human\u2013computer interaction (HCI) and virtual reality (VR). Despite advancements in 2D HPE, challenges persist due to hand dynamics and occlusions. Accurate extraction of hand features, such as edges, textures, and unique patterns, is crucial for enhancing HPE. To address these challenges, we propose SDFPoseGraphNet, a novel framework that combines the strengths of the VGG-19 architecture with spatial attention (SA), enabling a more refined extraction of deep feature maps from hand images. By incorporating the Pose Graph Model (PGM), the network adaptively processes these feature maps to provide tailored pose estimations. First Inference Module (FIM) potentials, alongside adaptively learned parameters, contribute to the PGM\u2019s final pose estimation. The SDFPoseGraphNet, with its end-to-end trainable design, optimizes across all components, ensuring enhanced precision in hand pose estimation. Our proposed model outperforms existing state-of-the-art methods, achieving an average precision of 7.49% against the Convolution Pose Machine (CPM) and 3.84% in comparison to the Adaptive Graphical Model Network (AGMN).<\/jats:p>","DOI":"10.3390\/s23229088","type":"journal-article","created":{"date-parts":[[2023,11,10]],"date-time":"2023-11-10T05:04:56Z","timestamp":1699592696000},"page":"9088","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["SDFPoseGraphNet: Spatial Deep Feature Pose Graph Network for 2D Hand Pose Estimation"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9344-6658","authenticated-orcid":false,"given":"Sartaj Ahmed","family":"Salman","sequence":"first","affiliation":[{"name":"Department of Informatics, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3187-9551","authenticated-orcid":false,"given":"Ali","family":"Zakir","sequence":"additional","affiliation":[{"name":"Department of Informatics, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan"}]},{"given":"Hiroki","family":"Takahashi","sequence":"additional","affiliation":[{"name":"Department of Informatics, Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan"},{"name":"Artificial Intelligence Exploration Research Center\/Meta-Networking Research Center, The University of Electro-Communications, Tokyo 182-8585, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2023,11,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Chen, W., Yu, C., Tu, C., Lyu, Z., Tang, J., Ou, S., Fu, Y., and Xue, Z. (2020). A Survey on Hand Pose Estimation with Wearable Sensors and Computer-Vision-Based Methods. Sensors, 20.","DOI":"10.3390\/s20041074"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"11488","DOI":"10.1109\/JSEN.2020.3018172","article-title":"Attention! A Lightweight 2D Hand Pose Estimation Approach","volume":"21","author":"Santavas","year":"2021","journal-title":"IEEE Sens. J."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Joo, H., Simon, T., Li, X., Liu, H., Tan, L., Gui, L., Banerjee, S., Godisart, T., Nabbe, B., and Matthews, I. (2016). Panoptic Studio: A Massively Multiview System for Social Interaction Capture. arXiv.","DOI":"10.1109\/ICCV.2015.381"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Simon, T., Joo, H., Matthews, I., and Sheikh, Y. (2017, January 21\u201326). Hand keypoint detection in single images using multiview bootstrapping. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.494"},{"key":"ref_5","unstructured":"Zhang, Z., Xie, S., Chen, M., and Zhu, H. (2020). HandAugment: A simple data augmentation method for depth-based 3D hand pose estimation. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ge, L., Cai, Y., Weng, J., and Yuan, J. (2018, January 18\u201323). Hand pointnet: 3D hand pose estimation using point sets. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00878"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yuan, S., Garcia-Hernando, G., Stenger, B., Moon, G., Chang, J.Y., Lee, K.M., Molchanov, P., Kautz, J., Honari, S., and Ge, L. (2018, January 18\u201323). Depth-based 3D hand pose estimation: From current achievements to future goals. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00279"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Cai, Y., Ge, L., Cai, J., and Yuan, J. (2018, January 14\u201318). Weakly-supervised 3D hand pose estimation from monocular rgb images. Proceedings of the European conference on computer vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01231-1_41"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Panteleris, P., Oikonomidis, I., and Argyros, A. (2018, January 12\u201315). Using a single rgb frame for real time 3D hand pose estimation in the wild. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00054"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Boukhayma, A., Bem, R.d., and Torr, P.H. (2019, January 15\u201320). 3D hand shape and pose from images in the wild. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01110"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., and Theobalt, C. (2018, January 18\u201323). Ganerated hands for real-time 3D hand tracking from monocular rgb. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00013"},{"key":"ref_12","unstructured":"Wei, S.E., Ramakrishna, V., Kanade, T., and Sheikh, Y. (July, January 26). Convolutional pose machines. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Song, J., Wang, L., Van Gool, L., and Hilliges, O. (2017, January 21\u201326). Thin-slicing network: A deep structured model for pose estimation in videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.590"},{"key":"ref_14","first-page":"1","article-title":"Joint training of a convolutional network and a graphical model for human pose estimation","volume":"27","author":"Tompson","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yang, W., Ouyang, W., Li, H., and Wang, X. (2016, January 27\u201330). End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.335"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Salman, S.A., Zakir, A., and Takahashi, H. (2023, January 9\u201311). Cascaded deep graphical convolutional neural network for 2D hand pose estimation. Proceedings of the International Workshop on Advanced Imaging Technology (IWAIT) 2023, SPIE, Jeju, Republic of Korea.","DOI":"10.1117\/12.2666956"},{"key":"ref_17","unstructured":"Zhu, X., Cheng, D., Zhang, Z., Lin, S., and Dai, J. (November, January 27). An empirical study of spatial attention mechanisms in deep networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3579","DOI":"10.1049\/iet-ipr.2019.0924","article-title":"Multi-view hand gesture recognition via pareto optimal front","volume":"14","author":"Sun","year":"2020","journal-title":"IET Image Process."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, Y., Jiang, J., Sun, J., and Wang, X. (2021). InterNet+: A Light Network for Hand Pose Estimation. Sensors, 21.","DOI":"10.3390\/s21206747"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Sun, X., Wang, B., Huang, L., Zhang, Q., Zhu, S., and Ma, Y. (2021). CrossFuNet: RGB and Depth Cross-Fusion Network for Hand Pose Estimation. Sensors, 21.","DOI":"10.3390\/s21186095"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1016\/j.neucom.2018.06.097","article-title":"Pose guided structured region ensemble network for cascaded hand pose estimation","volume":"395","author":"Chen","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"4422","DOI":"10.1109\/TIP.2018.2834824","article-title":"Robust 3D hand pose estimation from single depth images using multi-view CNNs","volume":"27","author":"Ge","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"103200","DOI":"10.1016\/j.jvcir.2021.103200","article-title":"A CNN model for real time hand pose estimation","volume":"79","author":"Ding","year":"2021","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3258","DOI":"10.1109\/TCSVT.2018.2879980","article-title":"Mask-pose cascaded cnn for 2D hand pose estimation from single color image","volume":"29","author":"Wang","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kanis, J., Gruber, I., Kr\u0148oul, Z., Boh\u00e1\u010dek, M., Straka, J., and Hr\u00faz, M. (2023). MuTr: Multi-Stage Transformer for Hand Pose Estimation from Full-Scene Depth Image. Sensors, 23.","DOI":"10.3390\/s23125509"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zimmermann, C., and Brox, T. (2017, January 22\u201329). Learning to estimate 3D hand pose from single rgb images. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.525"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Guan, X., Shen, H., Nyatega, C.O., and Li, Q. (2023). Repeated Cross-Scale Structure-Induced Feature Fusion Network for 2D Hand Pose Estimation. Entropy, 25.","DOI":"10.3390\/e25050724"},{"key":"ref_28","unstructured":"Tekin, B., Rozantsev, A., Lepetit, V., and Fua, P. (July, January 26). Direct prediction of 3D body poses from motion compensated sequences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_29","unstructured":"Li, S., and Chan, A.B. (2015). Computer Vision\u2013ACCV 2014, Proceedings of the 12th Asian Conference on Computer Vision, Singapore, 1\u20135 November 2014, Springer. Revised Selected Papers, Part II 12."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wan, C., Probst, T., Van Gool, L., and Yao, A. (2018, January 18\u201323). Dense 3D regression for hand pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00540"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Awan, M.J., Masood, O.A., Mohammed, M.A., Yasin, A., Zain, A.M., Dama\u0161evi\u010dius, R., and Abdulkareem, K.H. (2021). Image-based malware classification using VGG19 network and spatial convolutional attention. Electronics, 10.","DOI":"10.3390\/electronics10192444"},{"key":"ref_32","unstructured":"Misra, D. (2019). Mish: A self regularized non-monotonic activation function. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Kong, D., Chen, Y., Ma, H., Yan, X., and Xie, X. (2019). Adaptive graphical model network for 2D handpose estimation. arXiv.","DOI":"10.1109\/WACV45572.2020.9093638"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"106771","DOI":"10.1016\/j.knosys.2021.106771","article-title":"Image classification with deep learning in the presence of noisy labels: A survey","volume":"215","author":"Algan","year":"2021","journal-title":"Knowl.-Based Syst."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/22\/9088\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:20:55Z","timestamp":1760131255000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/22\/9088"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,10]]},"references-count":34,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["s23229088"],"URL":"https:\/\/doi.org\/10.3390\/s23229088","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,11,10]]}}}