{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T03:56:36Z","timestamp":1768622196981,"version":"3.49.0"},"reference-count":57,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T00:00:00Z","timestamp":1675123200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Natural Sciences and Engineering Research Council of Canada","award":["NSERC. RGPIN-2022-03049"],"award-info":[{"award-number":["NSERC. RGPIN-2022-03049"]}]},{"name":"Natural Sciences and Engineering Research Council of Canada","award":["CPG-163986"],"award-info":[{"award-number":["CPG-163986"]}]},{"name":"Canadian Institutes of Health Research","award":["NSERC. RGPIN-2022-03049"],"award-info":[{"award-number":["NSERC. RGPIN-2022-03049"]}]},{"name":"Canadian Institutes of Health Research","award":["CPG-163986"],"award-info":[{"award-number":["CPG-163986"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This paper tackles a novel and challenging problem\u20143D hand pose estimation (HPE) from a single RGB image using partial annotation. Most HPE methods ignore the fact that the keypoints could be partially visible (e.g., under occlusions). In contrast, we propose a deep-learning framework, PA-Tran, that jointly estimates the keypoints status and 3D hand pose from a single RGB image with two dependent branches. The regression branch consists of a Transformer encoder which is trained to predict a set of target keypoints, given an input set of status, position, and visual features embedding from a convolutional neural network (CNN); the classification branch adopts a CNN for estimating the keypoints status. One key idea of PA-Tran is a selective mask training (SMT) objective that uses a binary encoding scheme to represent the status of the keypoints as observed or unobserved during training. In addition, by explicitly encoding the label status (observed\/unobserved), the proposed PA-Tran can efficiently handle the condition when only partial annotation is available. Investigating the annotation percentage ranging from 50\u2013100%, we show that training with partial annotation is more efficient (e.g., achieving the best 6.0 PA-MPJPE when using about 85% annotations). Moreover, we provide two new datasets. APDM-Hand, is for synthetic hands with APDM sensor accessories, which is designed for a specific hand task. PD-APDM-Hand, is a real hand dataset collected from Parkinson\u2019s Disease (PD) patients with partial annotation. The proposed PA-Tran can achieve higher estimation accuracy when evaluated on both proposed datasets and a more general hand dataset.<\/jats:p>","DOI":"10.3390\/s23031555","type":"journal-article","created":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T01:36:59Z","timestamp":1675215419000},"page":"1555","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["PA-Tran: Learning to Estimate 3D Hand Pose with Partial Annotation"],"prefix":"10.3390","volume":"23","author":[{"given":"Tianze","family":"Yu","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2978-0832","authenticated-orcid":false,"given":"Luke","family":"Bidulka","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4048-0817","authenticated-orcid":false,"given":"Martin J.","family":"McKeown","sequence":"additional","affiliation":[{"name":"Faculty of Medicine, University of British Columbia, Vancouver, BC V6T 1Z4, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Z. Jane","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, BC V6T 1Z4, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Chatzis, T., Stergioulas, A., Konstantinidis, D., Dimitropoulos, K., and Daras, P. (2020). A Comprehensive Study on Deep Learning-Based 3D Hand Pose Estimation Methods. Appl. Sci., 10.","DOI":"10.3390\/app10196850"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1007\/s10055-016-0301-0","article-title":"Hand posture and gesture recognition techniques for virtual reality applications: A survey","volume":"21","author":"Sagayam","year":"2017","journal-title":"Virtual Real."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Meier, M., Streli, P., Fender, A., and Holz, C. (April, January 27). TapID: Rapid touch interaction in virtual reality using wearable sensing. Proceedings of the 2021 IEEE Virtual Reality and 3D User Interfaces (VR), Lisboa, Portugal.","DOI":"10.1109\/VR50410.2021.00076"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Noreen, I., Hamid, M., Akram, U., Malik, S., and Saleem, M. (2021). Hand pose recognition using parallel multi stream CNN. Sensors, 21.","DOI":"10.3390\/s21248469"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1109\/THMS.2021.3086003","article-title":"Human-machine interaction sensing technology based on hand gesture recognition: A review","volume":"51","author":"Guo","year":"2021","journal-title":"IEEE Trans. Hum.-Mach. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1449","DOI":"10.1109\/TCDS.2021.3108136","article-title":"First-Person Hand Action Recognition Using Multimodal Data","volume":"14","author":"Li","year":"2021","journal-title":"IEEE Trans. Cogn. Dev. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"113794","DOI":"10.1016\/j.eswa.2020.113794","article-title":"Sign language recognition: A deep survey","volume":"164","author":"Rastgoo","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1007\/s11831-019-09384-2","article-title":"Sign language recognition systems: A decade systematic literature review","volume":"28","author":"Wadhawan","year":"2021","journal-title":"Arch. Comput. Methods Eng."},{"key":"ref_9","unstructured":"Microsoft (2023, January 30). Azure Kinect DK. Available online: https:\/\/azure.microsoft.com\/en-us\/products\/kinect-dk\/."},{"key":"ref_10","unstructured":"Luxonis (2023, January 30). Oak-D. Available online: https:\/\/shop.luxonis.com\/products\/oak-d."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Oberweger, M., and Lepetit, V. (2017, January 22\u201329). Deepprior++: Improving fast and accurate 3d hand pose estimation. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.75"},{"key":"ref_12","unstructured":"Zhang, Z., Xie, S., Chen, M., and Zhu, H. (2020). HandAugment: A simple data augmentation method for depth-based 3D hand pose estimation. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Rong, Z., Kong, D., Wang, S., and Yin, B. (December, January 30). RGB-D Hand Pose Estimation Using Fourier Descriptor. Proceedings of the 2018 7th International Conference on Digital Home (ICDH), Guilin, China.","DOI":"10.1109\/ICDH.2018.00018"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Liu, Y., Zhang, S., and Gowda, M. (2021, January 19\u201323). NeuroPose: 3D Hand Pose Tracking using EMG Wearables. Proceedings of the Web Conference, Ljubljana, Slovenia.","DOI":"10.1145\/3442381.3449890"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Chen, W., Yu, C., Tu, C., Lyu, Z., Tang, J., Ou, S., Fu, Y., and Xue, Z. (2020). A survey on hand pose estimation with wearable sensors and computer-vision-based methods. Sensors, 20.","DOI":"10.3390\/s20041074"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Toshev, A., and Szegedy, C. (2014, January 23\u201328). Deeppose: Human pose estimation via deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.214"},{"key":"ref_17","unstructured":"Wei, S.E., Ramakrishna, V., Kanade, T., and Sheikh, Y. (July, January 26). Convolutional pose machines. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Newell, A., Yang, K., and Deng, J. (2016, January 11\u201314). Stacked hourglass networks for human pose estimation. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Fan, L., Rao, H., and Yang, W. (2021). 3D Hand Pose Estimation Based on Five-Layer Ensemble CNN. Sensors, 21.","DOI":"10.3390\/s21020649"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Simon, T., Joo, H., Matthews, I., and Sheikh, Y. (2017, January 21\u201326). Hand keypoint detection in single images using multiview bootstrapping. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.494"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"2104","DOI":"10.1109\/TCSVT.2019.2912620","article-title":"Pose anchor: A single-stage hand keypoint detection network","volume":"30","author":"Li","year":"2019","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, Y., Jiang, J., and Sun, J. (2021, January 20\u201322). Hand Pose Estimation from RGB Images Based on Deep Learning: A Survey. Proceedings of the 2021 IEEE 7th International Conference on Virtual Reality (ICVR), Foshan, China.","DOI":"10.1109\/ICVR51878.2021.9483815"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zimmermann, C., and Brox, T. (2017, January 22\u201329). Learning to estimate 3D hand pose from single rgb images. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.525"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Doosti, B., Naha, S., Mirbagheri, M., and Crandall, D.J. (2020, January 14\u201319). Hope-net: A graph-based model for hand-object pose estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00664"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kanazawa, A., Black, M.J., Jacobs, D.W., and Malik, J. (2018, January 18\u201322). End-to-end recovery of human shape and pose. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00744"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Boukhayma, A., Bem, R.D., and Torr, P.H. (2019, January 15\u201320). 3D hand shape and pose from images in the wild. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01110"},{"key":"ref_27","unstructured":"Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M., and Brox, T. (November, January 27). Freihand: A dataset for markerless capture of hand pose and shape from single rgb images. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Moon, G., and Lee, K.M. (2020, January 23\u201328). I2l-meshnet: Image-to-lixel prediction network for accurate 3D human pose and mesh estimation from a single rgb image. Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58571-6_44"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lin, K., Wang, L., and Liu, Z. (2021, January 20\u201325). End-to-end human pose and mesh reconstruction with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00199"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zheng, J., Shi, X., Gorban, A., Mao, J., Song, Y., Qi, C.R., Liu, T., Chari, V., Cornman, A., and Zhou, Y. (2022, January 18\u201324). Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00494"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, J., Liu, L., Xu, W., Sarkar, K., Luvizon, D., and Theobalt, C. (2022, January 18\u201324). Estimating Egocentric 3D Human Pose in the Wild with External Weak Supervision. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01281"},{"key":"ref_32","first-page":"3676","article-title":"Partial multi-label learning with noisy label identification","volume":"44","author":"Xie","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2473","DOI":"10.1109\/JBHI.2020.2970091","article-title":"CycleGAN with an improved loss function for cell detection using partly labeled images","volume":"24","author":"He","year":"2020","journal-title":"IEEE J. Biomed. Health Inform."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"101979","DOI":"10.1016\/j.media.2021.101979","article-title":"Marginal loss and exclusion loss for partially supervised multi-organ segmentation","volume":"70","author":"Shi","year":"2021","journal-title":"Med. Image Anal."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"3739","DOI":"10.1109\/TPAMI.2020.2993627","article-title":"3D hand pose estimation using synthetic data and weakly labeled RGB images","volume":"43","author":"Cai","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","unstructured":"Abdi, M., Abbasnejad, E., Lim, C.P., and Nahavandi, S. (2018). 3D hand pose estimation using simulation and partial-supervision with a shared latent space. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chen, L., Lin, S.Y., Xie, Y., Lin, Y.Y., and Xie, X. (2021, January 3\u20138). Mvhm: A large-scale multi-view hand mesh benchmark for accurate 3d hand pose estimation. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00088"},{"key":"ref_38","unstructured":"Gao, D., Xiu, Y., Li, K., Yang, L., Wang, F., Zhang, P., Zhang, B., Lu, C., and Tan, P. (2022). DART: Articulated Hand Model with Diverse Accessories and Rich Textures. arXiv."},{"key":"ref_39","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An Image is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. arXiv."},{"key":"ref_40","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Lanchantin, J., Wang, T., Ordonez, V., and Qi, Y. (2021, January 20\u201325). General Multi-label Image Classification with Transformers. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01621"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. arXiv.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_43","unstructured":"Kolotouros, N., Pavlakos, G., Black, M.J., and Daniilidis, K. (2, January 27). Learning to reconstruct 3D human pose and shape via model-fitting in the loop. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_44","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_45","unstructured":"Community, B.O. (2018). Blender\u2014A 3D Modelling and Rendering Package, Blender Foundation; Stichting Blender Foundation."},{"key":"ref_46","unstructured":"Haas, J.K. (2014). A History of the Unity Game Engine, Worcester Polytechnic Institute."},{"key":"ref_47","unstructured":"APDM (2023, January 30). OPAL Research-Grade Wearable Sensors. Available online: https:\/\/apdm.com\/wearable-sensors\/."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Zimmermann, C., and Brox, T. (2017). Learning to Estimate 3D Hand Pose from Single RGB Images. arXiv.","DOI":"10.1109\/ICCV.2017.525"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., and Theobalt, C. (2018, January 18\u201322). Ganerated hands for real-time 3d hand tracking from monocular rgb. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00013"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M.J., Laptev, I., and Schmid, C. (2019, January 15\u201320). Learning joint reconstruction of hands and manipulated objects. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01208"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Movement Disorder Society Task Force on Rating Scales for Parkinson\u2019s Disease (2003). The unified Parkinson\u2019s disease rating scale (UPDRS): Status and recommendations. Mov. Disord., 18, 738\u2013750.","DOI":"10.1002\/mds.10473"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Quattoni, A., and Torralba, A. (2009, January 20\u201325). Recognizing indoor scenes. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206537"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1109\/TPAMI.2013.248","article-title":"Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments","volume":"36","author":"Ionescu","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Pavlakos, G., Zhu, L., Zhou, X., and Daniilidis, K. (2018, January 18\u201322). Learning to estimate 3D human pose and shape from a single color image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00055"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"3349","DOI":"10.1109\/TPAMI.2020.2983686","article-title":"Deep high-resolution representation learning for visual recognition","volume":"43","author":"Wang","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_57","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1555\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:20:11Z","timestamp":1760120411000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1555"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,31]]},"references-count":57,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["s23031555"],"URL":"https:\/\/doi.org\/10.3390\/s23031555","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,31]]}}}