{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T12:53:51Z","timestamp":1771332831214,"version":"3.50.1"},"reference-count":69,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2020,11,7]],"date-time":"2020-11-07T00:00:00Z","timestamp":1604707200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61876170"],"award-info":[{"award-number":["61876170"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["51805168"],"award-info":[{"award-number":["51805168"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"R&amp;D project of CRRC Zhuzhou Locomotive Co., LTD","award":["2018GY121"],"award-info":[{"award-number":["2018GY121"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["CUG170692"],"award-info":[{"award-number":["CUG170692"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Three-dimensional hand detection from a single RGB-D image is an important technology which supports many useful applications. Practically, it is challenging to robustly detect human hands in unconstrained environments because the RGB-D channels can be affected by many uncontrollable factors, such as light changes. To tackle this problem, we propose a 3D hand detection approach which improves the robustness and accuracy by adaptively fusing the complementary features extracted from the RGB-D channels. Using the fused RGB-D feature, the 2D bounding boxes of hands are detected first, and then the 3D locations along the z-axis are estimated through a cascaded network. Furthermore, we represent a challenging RGB-D hand detection dataset collected in unconstrained environments. Different from previous works which primarily rely on either the RGB or D channel, we adaptively fuse the RGB-D channels for hand detection. Specifically, evaluation results show that the D-channel is crucial for hand detection in unconstrained environments. Our RGB-D fusion-based approach significantly improves the hand detection accuracy from 69.1 to 74.1 comparing to one of the most state-of-the-art RGB-based hand detectors. The existing RGB- or D-based methods are unstable in unseen lighting conditions: in dark conditions, the accuracy of the RGB-based method significantly drops to 48.9, and in back-light conditions, the accuracy of the D-based method dramatically drops to 28.3. Compared with these methods, our RGB-D fusion based approach is much more robust without accuracy degrading, and our detection results are 62.5 and 65.9, respectively, in these two extreme lighting conditions for accuracy.<\/jats:p>","DOI":"10.3390\/s20216360","type":"journal-article","created":{"date-parts":[[2020,11,8]],"date-time":"2020-11-08T19:03:37Z","timestamp":1604862217000},"page":"6360","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Robust 3D Hand Detection from a Single RGB-D Image in Unconstrained Environments"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5301-9376","authenticated-orcid":false,"given":"Chi","family":"Xu","sequence":"first","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"},{"name":"Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3111-3713","authenticated-orcid":false,"given":"Jun","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6514-4238","authenticated-orcid":false,"given":"Wendi","family":"Cai","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9615-4498","authenticated-orcid":false,"given":"Yunkai","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1696-3798","authenticated-orcid":false,"given":"Yongbo","family":"Li","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7980-9755","authenticated-orcid":false,"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"CRRC Zhuzhou Electric Locomotive Co., Ltd., Zhuzhou 412000, China"},{"name":"National Innovation Center of Advanced Rail Transit Equipment, Zhuzhou 412000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,11,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"19487","DOI":"10.3390\/s150819487","article-title":"Human-Computer Interaction in Smart Environments","volume":"15","author":"Gianluca","year":"2015","journal-title":"Sensors"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Xu, C., and Cheng, L. (2013, January 1\u20138). Efficient Hand Pose Estimation from a Single Depth Image. Proceedings of the International Conference on Computer Vision (ICCV), Darling Harbour, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.429"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1007\/s11263-017-0998-6","article-title":"Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups","volume":"123","author":"Xu","year":"2017","journal-title":"Int. J. Comput. Vis. (IJCV)"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., and Yuan, J. (2019, January 16\u201318). 3D Hand Shape and Pose Estimation From a Single RGB Image. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01109"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1109\/TPAMI.2005.61","article-title":"Real-time gesture recognition by learning and selective control of visual interest points","volume":"27","author":"Kirishima","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI)"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lin, H., Hsu, M., and Chen, W. (2014, January 18\u201322). Human hand gesture recognition using a convolution neural network. Proceedings of the International Conference on Automation Science and Engineering (CASE), Taipei, Taiwan.","DOI":"10.1109\/CoASE.2014.6899454"},{"key":"ref_7","unstructured":"Mittal, A., Zisserman, A., and Torr, P.H.S. (September, January 29). Hand detection using multiple proposals. Proceedings of the British Machine Vision Conference (BMVC), Dundee, UK."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Le, T.H.N., Quach, K.G., Zhu, C., Duong, C.N., Luu, K., and Savvides, M. (2017, January 21\u201326). Robust Hand Detection and Classification in Vehicles and in the Wild. Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.159"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1109\/TIP.2017.2779600","article-title":"Joint Hand Detection and Rotation Estimation Using CNN","volume":"27","author":"Deng","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_10","unstructured":"Narasimhaswamy, S., Wei, Z., Wang, Y., Zhang, J., and Hoai, M. (November, January 27). Contextual attention for hand detection in the wild. Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1071","DOI":"10.1007\/s00138-019-01038-4","article-title":"An embedded implementation of CNN-based hand detection and orientation estimation algorithm","volume":"30","author":"Yang","year":"2019","journal-title":"Mach. Vis. Appl."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Xu, C., Cai, W., Li, Y., Zhou, J., and Wei, L. (2020). Accurate Hand Detection from Single-Color Images by Reconstructing Hand Appearances. Sensors, 20.","DOI":"10.3390\/s20010192"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Feng, R., Perez, C., and Zhang, H. (2017, January 16\u201319). Towards transferring grasping from human to robot with RGBD hand detection. Proceedings of the Conference on Computer and Robot Vision (CRV), Edmonton, AB, Canada.","DOI":"10.1109\/CRV.2017.45"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1016\/j.patcog.2017.08.009","article-title":"Hand action detection from ego-centric depth sequences with error-correcting Hough transform","volume":"72","author":"Xu","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Mees, O., Eitel, A., and Burgard, W. (2016, January 9\u201314). Choosing Smartly: Adaptive Multimodal Fusion for Object Detection in Changing Environments. Proceedings of the 2016 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea.","DOI":"10.1109\/IROS.2016.7759048"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1177\/0278364917713117","article-title":"RGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter","volume":"37","author":"Schwarz","year":"2018","journal-title":"Int. J. Robot. Res."},{"key":"ref_17","first-page":"9176","article-title":"ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition","volume":"33","author":"Yuan","year":"2019","journal-title":"Assoc. Adv. Artif. Intell. (AAAI)"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1016\/j.ins.2018.09.040","article-title":"3D object detection: Learning 3D bounding boxes from scaled down 2D bounding boxes in RGB-D images","volume":"476","author":"Rahman","year":"2019","journal-title":"Inf. Sci."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1591","DOI":"10.1109\/TIP.2018.2878956","article-title":"Cross-Modal Attentional Context Learning for RGB-D Object Detection","volume":"28","author":"Li","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ophoff, T., Van Beeck, K., and Goedem\u00e9, T. (2019). Exploring RGB+Depth fusion for real-time object detection. Sensors, 19.","DOI":"10.3390\/s19040866"},{"key":"ref_21","unstructured":"Christian, Z., and Thomas, B. (2017, January 22\u201329). Learning to estimate 3D hand pose from single RGB images. Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"323","DOI":"10.2214\/ajr.154.2.2105024","article-title":"Masses of the hand and wrist: Detection and characterization with MR imaging","volume":"154","author":"Binkovitz","year":"1990","journal-title":"Am. J. Roentgenol."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"N\u00f6lker, C., and Ritter, H. (1998). Detection of fingertips in human hand movement sequences. Gesture and Sign Language in Human-Computer Interaction, Springer.","DOI":"10.1007\/BFb0053001"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1109\/TPAMI.2004.35","article-title":"Skin color-based video segmentation under time-varying illumination","volume":"26","author":"Sigal","year":"2004","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI)"},{"key":"ref_25","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of oriented gradients for human detection. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Meng, X., Lin, J., and Ding, Y. (2012, January 20\u201323). An extended HOG model: SCHOG for human hand detection. Proceedings of the International Conference on Systems and Informatics (ICSAI), L\u0105dek Zdr\u00f3j, Poland.","DOI":"10.1109\/ICSAI.2012.6223584"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Guo, J., Cheng, J., Pang, J., and Guo, Y. (2013, January 15\u201318). Real-time hand detection based on multi-stage HOG-SVM classifier. Proceedings of the International Conference on Image Processing (ICIP), Melbourne, Australia.","DOI":"10.1109\/ICIP.2013.6738846"},{"key":"ref_28","unstructured":"Del Solar, J.R., and Verschae, R. (2004, January 19). Skin detection using neighborhood information. Proceedings of the International Conference on Automatic Face and Gesture Recognition, Seoul, Korea."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Li, C., and Kitani, K.M. (2013, January 23\u201328). Pixel-Level Hand Detection in Ego-centric Videos. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.458"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1016\/j.neucom.2019.02.066","article-title":"Robust real-time hand detection and localization for space human\u2013robot interaction based on deep learning","volume":"390","author":"Gao","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, G., Luo, C., Sun, X., Xiong, Z., and Zeng, W. (2020, January 13\u201319). Tracking by instance detection: A meta-learning approach. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00632"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Kohli, P., and Shotton, J. (2013). Key developments in human pose estimation for kinect. Consumer Depth Cameras for Computer Vision, Springer.","DOI":"10.1007\/978-1-4471-4640-7_4"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Qian, C., Sun, X., Wei, Y., Tang, X., and Sun, J. (2014, January 24\u201327). Realtime and Robust Hand Tracking from Depth. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.145"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1007\/s11263-015-0826-9","article-title":"Estimate Hand Poses Efficiently from Single Depth Images","volume":"116","author":"Xu","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Oberweger, M., and Lepetit, V. (2017, January 22\u201329). Deepprior++: Improving fast and accurate 3d hand pose estimation. Proceedings of the International Conference on Computer Vision Workshops (ICCVW), Venice, Italy.","DOI":"10.1109\/ICCVW.2017.75"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2629500","article-title":"Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks","volume":"33","author":"Tompson","year":"2014","journal-title":"ACM Trans. Graph."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Rogez, G., Khademi, M., Supan\u010di\u010d, J.S., Montiel, J.M.M., and Ramanan, D. (2015). 3D Hand Pose Detection in Egocentric RGB-D Images. European Conference on Computer Vision Workshops (ECCVW), Springer International Publishing.","DOI":"10.1007\/978-3-319-16178-5_25"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Gupta, S., Girshick, R., Arbel\u00e1ez, P., and Malik, J. (2014). Learning Rich Features from RGB-D Images for Object Detection and Segmentation. European Conference on Computer Vision (ECCV), Springer.","DOI":"10.1007\/978-3-319-10584-0_23"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201323). Frustum pointnets for 3D object detection from rgb-d data. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Wang, C., Xu, D., Zhu, Y., Martin-Martin, R., Lu, C., Fei-Fei, L., and Savarese, S. (2019, January 16\u201320). DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00346"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1016\/j.ins.2018.02.024","article-title":"Deep attention network for joint hand gesture localization and recognition using static RGB-D images","volume":"441","author":"Li","year":"2018","journal-title":"Inf. Sci."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Gupta, S., Arbelaez, P., and Malik, J. (2013, January 23\u201328). Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.79"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3D Object Detection Network for Autonomous Driving. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zhao, C., Sun, L., Purkait, P., Duckett, T., and Stolkin, R. (2018). Dense RGB-D Semantic Mapping with Pixel-Voxel Neural Network. Sensors, 18.","DOI":"10.3390\/s18093099"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Song, S., and Xiao, J. (2015, January 7\u201312). Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2016.94"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Xu, D., Anguelov, D., and Jain, A. (2018, January 18\u201322). PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00033"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Peng, H., Li, B., Xiong, W., Hu, W., and Ji, R. (2014). RGBD Salient Object Detection: A Benchmark and Algorithms. European Conference on Computer Vision (ECCV), Springer.","DOI":"10.1007\/978-3-319-10578-9_7"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1016\/j.patcog.2017.07.026","article-title":"Multi-modal deep feature learning for RGB-D object detection","volume":"72","author":"Xu","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Hazirbas, C., Ma, L., Domokos, C., and Cremers, D. (2017). FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture. Computer Vision\u2014ACCV 2016, Springer International Publishing.","DOI":"10.1007\/978-3-319-54181-5_14"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Chen, H., and Li, Y. (2018, January 18\u201322). Progressively Complementarity-Aware Fusion Network for RGB-D Salient Object Detection. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00322"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1016\/j.patcog.2018.08.007","article-title":"Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection","volume":"86","author":"Chen","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Prabhakar, K.R., Srikar, V.S., and Babu, R.V. (2017, January 22\u201329). DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.505"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Zhao, J.X., Cao, Y., Fan, D.P., Cheng, M.M., Li, X.Y., and Zhang, L. (2019, January 16\u201320). Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00405"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.ins.2019.09.006","article-title":"Semantic Relation Extraction Using Sequential and Tree-structured LSTM with Attention","volume":"509","author":"Geng","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Misra, I., Shrivastava, A., Gupta, A., and Hebert, M. (2016, January 27\u201330). Cross-stitch networks for multi-task learning. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.433"},{"key":"ref_56","unstructured":"El, R.O., Rosman, G., Wetzler, A., Kimmel, R., and Bruckstein, A.M. (2015, January 7\u201312). RGBD-fusion: Real-time high precision depth recovery. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Bambach, S., Lee, S., Crandall, D.J., and Yu, C. (2015, January 7\u201313). Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions. Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.226"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Martin, S., Yuen, K., and Trivedi, M.M. (2016, January 19\u201322). Vision for Intelligent Vehicles & Applications (VIVA): Face detection and head pose challenge. Proceedings of the Intelligent Vehicles Symposium (IV), Gotenburg, Sweden.","DOI":"10.1109\/IVS.2016.7535512"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Yuan, S., Ye, Q., Stenger, B., Jain, S., and Kim, T.K. (2017, January 21\u201316). BigHand2.2M Benchmark: Hand Pose Dataset and State of the Art Analysis. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.279"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., and Theobalt, C. (2017, January 22\u201329). Real-time hand tracking under occlusion from an egocentric rgb-d sensor. Proceedings of the International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.82"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201316). Feature Pyramid Networks for Object Detection. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_62","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective Search for Object Recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"2189","DOI":"10.1109\/TPAMI.2012.28","article-title":"Measuring the Objectness of Image Windows","volume":"34","author":"Alexe","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Cao, Z., Simon, T., Wei, S.E., and Sheikh, Y. (2017, January 21\u201326). Realtime multi-person 2D pose estimation using Part Affinity Fields. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.143"},{"key":"ref_67","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Khan, A.U., and Borji, A. (2018, January 18\u201322). Analysis of Hand Segmentation in the Wild. Proceedings of the Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00495"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Baek, S., Kim, K.I., and Kim, T.K. (2019, January 16\u201320). Pushing the Envelope for RGB-Based Dense 3D Hand Pose Estimation via Neural Rendering. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00116"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/21\/6360\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:30:36Z","timestamp":1760178636000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/21\/6360"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,7]]},"references-count":69,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2020,11]]}},"alternative-id":["s20216360"],"URL":"https:\/\/doi.org\/10.3390\/s20216360","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,7]]}}}