{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T15:10:13Z","timestamp":1761664213486,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2021,1,19]],"date-time":"2021-01-19T00:00:00Z","timestamp":1611014400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61462038, 61562039, 61502213, and 62062041"],"award-info":[{"award-number":["61462038, 61562039, 61502213, and 62062041"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Science and Technology Planning Project of Jiangxi Provincial Department of Education","award":["GJJ190217"],"award-info":[{"award-number":["GJJ190217"]}]},{"name":"the Open Project Program of the State Key Lab of CAD &amp; CG of Zhejiang University","award":["A2029"],"award-info":[{"award-number":["A2029"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Estimating accurate 3D hand pose from a single RGB image is a highly challenging problem in pose estimation due to self-geometric ambiguities, self-occlusions, and the absence of depth information. To this end, a novel Five-Layer Ensemble CNN (5LENet) is proposed based on hierarchical thinking, which is designed to decompose the hand pose estimation task into five single-finger pose estimation sub-tasks. Then, the sub-task estimation results are fused to estimate full 3D hand pose. The hierarchical method is of great benefit to extract deeper and better finger feature information, which can effectively improve the estimation accuracy of 3D hand pose. In addition, we also build a hand model with the center of the palm (represented as Palm) connected to the middle finger according to the topological structure of hand, which can further boost the performance of 3D hand pose estimation. Additionally, extensive quantitative and qualitative results on two public datasets demonstrate the effectiveness of 5LENet, yielding new state-of-the-art 3D estimation accuracy, which is superior to most advanced estimation methods.<\/jats:p>","DOI":"10.3390\/s21020649","type":"journal-article","created":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T03:34:25Z","timestamp":1611113665000},"page":"649","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["3D Hand Pose Estimation Based on Five-Layer Ensemble CNN"],"prefix":"10.3390","volume":"21","author":[{"given":"Lili","family":"Fan","sequence":"first","affiliation":[{"name":"School of Information Engineering, Nanchang University, Nanchang 330031, China"}]},{"given":"Hong","family":"Rao","sequence":"additional","affiliation":[{"name":"Center of Computer, Nanchang University, Nanchang 330031, China"}]},{"given":"Wenji","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Software, Jiangxi Agricultural University, Nanchang 330045, China"},{"name":"State Key Lab of CAD &amp; CG of Zhejiang University, Hangzhou 310058, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,1,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Huang, X.Y., Tsai, M.S., and Huang, C.C. (2019, January 20\u201322). 3D Virtual-Reality Interaction System. Proceedings of the 2019 IEEE International Conference on Consumer Electronics-Taiwan, ICCE-TW 2019, Ilan, Taiwan.","DOI":"10.1109\/ICCE-TW46550.2019.8991800"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hsieh, M.C., and Lee, J.J. (2018). Preliminary Study of VR and AR Applications in Medical and Healthcare Education. J. Nurs. Heal. Stud., 3.","DOI":"10.21767\/2574-2825.100030"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Mahesh, M.M., Jerry, M.S., and Wadala, D.S.B. (2017, January 19\u201320). FPGA based hand sign recognition system. Proceedings of the RTEICT 2017\u20132nd IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology, Bangalore, India.","DOI":"10.1109\/RTEICT.2017.8256604"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Antoshchuk, S., Kovalenko, M., and Sieck, J. (2018). Gesture recognition-based human-computer interaction interface for multimedia applications. Digitisation of Culture: Namibian and International Perspectives, Springer.","DOI":"10.1007\/978-981-10-7697-8_16"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2204","DOI":"10.1109\/TII.2020.2998818","article-title":"Visual Perception Enabled Industry Intelligence: State of the Art, Challenges and Prospects","volume":"17","author":"Yang","year":"2020","journal-title":"IEEE Trans. Ind. Inf."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Stenger, B., Thayananthan, A., Torr, P.H.S., and Cipolla, R. (2006). Model-based hand tracking using a hierarchical bayesian filter. IEEE Trans. Pattern Anal. Mach. Intell., 28.","DOI":"10.1109\/TPAMI.2006.189"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Tompson, J., Stein, M., Lecun, Y., and Perlin, K. (2014). Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph., 33.","DOI":"10.1145\/2629500"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Sun, X., Wei, Y., Liang, S., Tang, X., and Sun, J. (2015, January 7\u201312). Cascaded hand pose regression. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognitionm, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298683"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ge, L., Liang, H., Yuan, J., and Thalmann, D. (2017, January 21\u201326). 3D convolutional neural networks for efficient and robust hand pose estimation from single depth images. Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.602"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wan, C., Probst, T., Van Gool, L., and Yao, A. (2017, January 21\u201326). Crossing nets: Combining gans and vaes with a shared latent space for hand pose estimation. Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.132"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Cai, Y., Ge, L., Cai, J., Magnenat-Thalmann, N., and Yuan, J. (2020). 3D Hand Pose Estimation Using Synthetic Data and Weakly Labeled RGB Images. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2020.2993627"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zimmermann, C., and Brox, T. (2017, January 22\u201329). Learning to Estimate 3D Hand Pose from Single RGB Images. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.525"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, X., Jiang, J., Guo, Y., Kang, L., Wei, Y., and Li, D. (2020). CFAM: Estimating 3D hand poses from a single RGB image with attention. Appl. Sci., 10.","DOI":"10.3390\/app10020618"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Spurr, A., Song, J., Park, S., and Hilliges, O. (2018, January 18\u201322). Cross-Modal Deep Variational Hand Pose Estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00017"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Cai, Y., Ge, L., Cai, J., and Yuan, J. (2018, January 16\u201319). Weakly-supervised 3D hand pose estimation from monocular RGB images. Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Suzhou, China.","DOI":"10.1007\/978-3-030-01231-1_41"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Baek, S., Kim, K.I., and Kim, T.K. (2019, January 16\u201320). Pushing the envelope for rgb-based dense 3D hand pose estimation via neural rendering. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00116"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Guo, S., Rigall, E., Qi, L., Dong, X., Li, H., and Dong, J. (2020). Graph-based CNNs with self-supervised module for 3D hand pose estimation from Monocular RGB. IEEE Trans. Circuits Syst. Video Technol.","DOI":"10.1109\/TCSVT.2020.3004453"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Ge, L., Liang, H., Yuan, J., and Thalmann, D. (2018). Robust 3D Hand Pose Estimation from Single Depth Images Using Multi-View CNNs. Proc. IEEE Trans. Image Process., 27.","DOI":"10.1109\/TIP.2018.2834824"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chen, L., Lin, S.Y., Xie, Y., Lin, Y.Y., Fan, W., and Xie, X. (2020, January 1\u20135). DGGAN: Depth-image guided generative adversarial networks for disentangling RGB and depth images in 3D hand pose estimation. Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093380"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., and Yuan, J. (2019, January 16\u201320). 3D hand shape and pose estimation from a single RGB image. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01109"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., and Yang, H. (2017, January 17\u201320). Region ensemble network: Improving convolutional network for hand pose estimation. Proceedings of the International Conference on Image Processing, ICIP, Beijing, China.","DOI":"10.1109\/ICIP.2017.8297136"},{"key":"ref_22","unstructured":"Zhou, Y., Lu, J., Du, K., Lin, X., Sun, Y., and Ma, X. (2011, January 8\u201310). HBE: Hand branch ensemble network for real-time 3d hand pose estimation. Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Hong Kong, China."},{"key":"ref_23","unstructured":"Madadi, M., Escalera, S., Baro, X., and Gonzalez, J. (2017). End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth Data. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Du, K., Lin, X., Sun, Y., and Ma, X. (2019, January 16\u201320). Crossinfonet: Multi-task information sharing based hand pose estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01013"},{"key":"ref_25","first-page":"1","article-title":"Cascaded Hierarchical CNN for RGB-Based 3D Hand Pose Estimation","volume":"2020","author":"Dai","year":"2020","journal-title":"Math. Probl. Eng."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1016\/j.neucom.2018.06.097","article-title":"Pose guided structured region ensemble network for cascaded hand pose estimation","volume":"395","author":"Chen","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_27","first-page":"558","article-title":"Color image 3d gesture estimation based on cascade convolution neural network","volume":"41","author":"Liu","year":"2020","journal-title":"J. Chin. Comput. Syst."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1093\/nsr\/nwx105","article-title":"An overview of multi-task learning","volume":"5","author":"Zhang","year":"2018","journal-title":"Natl. Sci. Rev."},{"key":"ref_29","unstructured":"Wei, S.E., Ramakrishna, V., Kanade, T., and Sheikh, Y. (July, January 26). Convolutional pose machines. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; IEEE Computer Society, Las Vegas, NV, USA."},{"key":"ref_30","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very deep convolutional networks for large-scale image recognition. Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015-Conference Track Proceedings, San Diego, CA, USA."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Pfister, T., Charles, J., and Zisserman, A. (2015, January 7\u201313). Flowing ConvNets for Human Pose Estimation in Videos. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.222"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3258","DOI":"10.1109\/TCSVT.2018.2879980","article-title":"Mask-Pose Cascaded CNN for 2D Hand Pose Estimation from Single Color Image","volume":"29","author":"Wang","year":"2019","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_33","unstructured":"Zhang, J., Jiao, J., Chen, M., Qu, L., Xu, X., and Yang, Q. (2016). 3D Hand Pose Tracking and Estimation Using Stereo Matching. arXiv."},{"key":"ref_34","unstructured":"Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., and Isard, M. (2016, January 2\u20134). TensorFlow: A system for large-scale machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA."},{"key":"ref_35","unstructured":"Kingma, D.P., and Ba, J.L. (2015, January 7\u20139). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015-Conference Track Proceedings; International Conference on Learning Representations, ICLR, San Diego, CA, USA."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Qian, C., Sun, X., Wei, Y., Tang, X., and Sun, J. (2014, January 23\u201328). Realtime and robust hand tracking from depth. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.145"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Panteleris, P., Oikonomidis, I., and Argyros, A. (2018, January 12\u201315). Using a Single RGB Frame for Real Time 3D Hand Pose Estimation in the Wild. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00054"},{"key":"ref_38","unstructured":"Oikonomidis, I., Kyriazis, N., and Argyros, A. (September, January 29). Efficient Model-Based 3D Tracking of Hand Articulations Using Kinect. Proceedings of the British Machine Vision Conference, Dundee, UK."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Yang, L., and Yao, A. (2019, January 16\u201320). Disentangling latent hands for image synthesis and pose estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01011"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Mueller, F., Bernard, F., Sotnychenko, O., Mehta, D., Sridhar, S., Casas, D., and Theobalt, C. (2018, January 18\u201322). GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00013"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Xiang, D., Joo, H., and Sheikh, Y. (2019, January 16\u201320). Monocular total capture: Posing face, body, and hands in the wild. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01122"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Iqbal, U., Molchanov, P., Breuel, T., Gall, J., and Kautz, J. (2018, January 14\u201316). Hand Pose Estimation via Latent 2.5D Heatmap Regression. Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Hong Kong, China.","DOI":"10.1007\/978-3-030-01252-6_8"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/2\/649\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:12:40Z","timestamp":1760159560000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/2\/649"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,19]]},"references-count":42,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["s21020649"],"URL":"https:\/\/doi.org\/10.3390\/s21020649","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,1,19]]}}}