{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T23:30:08Z","timestamp":1780443008820,"version":"3.54.1"},"reference-count":50,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,12,29]],"date-time":"2019-12-29T00:00:00Z","timestamp":1577577600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China under Grants","award":["61876170"],"award-info":[{"award-number":["61876170"]}]},{"name":"National Natural Science Fund Youth Science Fund of China under Grant","award":["51805168"],"award-info":[{"award-number":["51805168"]}]},{"name":"Fundamental Research Funds for Central Universities, China University of Geosciences","award":["CUG170692"],"award-info":[{"award-number":["CUG170692"]}]},{"name":"R &amp; D project of CRRC Zhuzhou Locomotive Co., LTD.","award":["2018GY121"],"award-info":[{"award-number":["2018GY121"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Hand detection is a crucial pre-processing procedure for many human hand related computer vision tasks, such as hand pose estimation, hand gesture recognition, human activity analysis, and so on. However, reliably detecting multiple hands from cluttering scenes remains to be a challenging task because of complex appearance diversities of dexterous human hands (e.g., different hand shapes, skin colors, illuminations, orientations, and scales, etc.) in color images. To tackle this problem, an accurate hand detection method is proposed to reliably detect multiple hands from a single color image using a hybrid detection\/reconstruction convolutional neural networks (CNN) framework, in which regions of hands are detected and appearances of hands are reconstructed in parallel by sharing features extracted from a region proposal layer, and the proposed model is trained in an end-to-end manner. Furthermore, it is observed that the generative adversarial network (GAN) could further boost the detection performance by generating more realistic hand appearances. The experimental results show that the proposed approach outperforms the state-of-the-art on public challenging hand detection benchmarks.<\/jats:p>","DOI":"10.3390\/s20010192","type":"journal-article","created":{"date-parts":[[2019,12,30]],"date-time":"2019-12-30T05:49:41Z","timestamp":1577684981000},"page":"192","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":21,"title":["Accurate Hand Detection from Single-Color Images by Reconstructing Hand Appearances"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5301-9376","authenticated-orcid":false,"given":"Chi","family":"Xu","sequence":"first","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6514-4238","authenticated-orcid":false,"given":"Wendi","family":"Cai","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1696-3798","authenticated-orcid":false,"given":"Yongbo","family":"Li","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3111-3713","authenticated-orcid":false,"given":"Jun","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2492-5510","authenticated-orcid":false,"given":"Longsheng","family":"Wei","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"},{"name":"Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2019,12,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, C., and Kitani, K.M. (2013, January 23\u201328). Pixel-level hand detection in ego-centric videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.458"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"19487","DOI":"10.3390\/s150819487","article-title":"Human-Computer Interaction in Smart Environments","volume":"15","author":"Paravati","year":"2015","journal-title":"Sensors"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1109\/TNSRE.2018.2814826","article-title":"Toward optimization of gaze-controlled human\u2013computer interaction: Application to hindi virtual keyboard for stroke patients","volume":"26","author":"Meena","year":"2018","journal-title":"IEEE Trans. Neural Syst. Rehabil. Eng."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1007\/s11263-017-0998-6","article-title":"Lie-X: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups","volume":"123","author":"Xu","year":"2017","journal-title":"Int. J. Comput. Vis."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Xu, C., and Cheng, L. (2013, January 1\u20138). Efficient Hand Pose Estimation from a Single Depth Image. Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.429"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., and Yuan, J. (2019, January 16\u201320). 3D Hand shape and pose estimation from a single RGB image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01109"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lin, H., Hsu, M., and Chen, W. (2014, January 18\u201322). Human hand gesture recognition using a convolution neural network. Proceedings of the IEEE International Conference on Automation Science and Engineering, Taipei, Taiwan.","DOI":"10.1109\/CoASE.2014.6899454"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1109\/TPAMI.2005.61","article-title":"Real-time gesture recognition by learning and selective control of visual interest points","volume":"27","author":"Kirishima","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Shahroudy, A., Liu, J., Ng, T.T., and Wang, G. (2016, January 27\u201330). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.115"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1109\/TPAMI.2004.35","article-title":"Skin color-based video segmentation under time-varying illumination","volume":"26","author":"Sigal","year":"2004","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of oriented gradients for human detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Guo, J., Cheng, J., Pang, J., and Guo, Y. (2013, January 15\u201318). Real-time hand detection based on multi-stage HOG-SVM classifier. Proceedings of the IEEE International Conference on Image Processing, Melbourne, Australia.","DOI":"10.1109\/ICIP.2013.6738846"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24\u201327). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 13\u201316). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_15","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1888","DOI":"10.1109\/TIP.2017.2779600","article-title":"Joint hand detection and rotation estimation using CNN","volume":"27","author":"Deng","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Le, T.H.N., Quach, K.G., Zhu, C., Duong, C.N., Luu, K., and Savvides, M. (2017, January 21\u201326). Robust hand detection and classification in vehicles and in the wild. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.159"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yang, L., Qi, Z., Liu, Z., Liu, H., Ling, M., Shi, L., and Liu, X. (2019). An embedded implementation of CNN-based hand detection and orientation estimation algorithm. Mach. Vis. Appl., 1\u201312.","DOI":"10.1007\/s00138-019-01038-4"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1023\/A:1007379606734","article-title":"Multitask learning","volume":"28","author":"Caruana","year":"1997","journal-title":"Mach. Learn."},{"key":"ref_24","unstructured":"Kingma, D.P., and Welling, M. (2014, January 7\u20139). Auto-encoding variational bayes. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_25","unstructured":"Larsen, A.B.L., S\u00f8nderby, S.K., Larochelle, H., and Winther, O. (2016, January 20\u201322). Autoencoding beyond pixels using a learned similarity metric. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_26","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014, January 8\u201313). Generative adversarial nets. Proceedings of the Advances in Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_27","unstructured":"Mittal, A., Zisserman, A., and Torr, P.H. (September, January 29). Hand detection using multiple proposals. Proceedings of the British Machine Vision Conference, University of Dundee, Dundee, UK."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Bambach, S., Lee, S., Crandall, D.J., and Yu, C. (2015, January 13\u201316). Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.226"},{"key":"ref_29","unstructured":"Kumaran, S.K., Dogra, D.P., Roy, P.P., and Mitra, A. (2018, December 18). Video Trajectory Classification and Anomaly Detection Using Hybrid CNN-VAE. Available online: https:\/\/arxiv.org\/pdf\/1812.07203.pdf."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., and Yan, S. (2017, January 22\u201325). Perceptual generative adversarial networks for small object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.211"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, X., Shrivastava, A., and Gupta, A. (2017, January 22\u201325). A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.324"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"061927","DOI":"10.1155\/ASP\/2006\/61927","article-title":"A human body analysis system","volume":"2006","author":"Girondel","year":"2006","journal-title":"EURASIP J. Adv. Signal Proc."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Karlinsky, L., Dinerstein, M., Harari, D., and Ullman, S. (2010, January 13\u201318). The chains model for detecting parts by their context. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540232"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A.A. (2017, January 4\u20139). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 22\u201325). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_37","unstructured":"Qing, G., Jinguo, L., and Zhaojie, J. (2019). Robust real-time hand detection and localization for space human-robot interaction based on deep learning. Neurocomputing."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1016\/j.neucom.2019.05.064","article-title":"Improving novelty detection with generative adversarial networks on hand gesture data","volume":"358","author":"Miguel","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"He, W., Xie, Z., Li, Y., Wang, X., and Cai, W. (2019). Synthesizing Depth Hand Images with GANs and Style Transfer for Hand Pose Estimation. Sensors, 19.","DOI":"10.3390\/s19132919"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Wan, C., Probst, T., Van Gool, L., and Yao, A. (2017, January 22\u201325). Crossing nets: Dual generative models with a shared latent space for hand pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.132"},{"key":"ref_41","unstructured":"Narasimhaswamy, S., Wei, Z., Wang, Y., Zhang, J., and Hoai, M. (November, January 27). Contextual Attention for Hand Detection in the Wild. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_42","unstructured":"Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2017, January 24\u201326). beta-VAE: Learning basic visual concepts with a constrained variational framework. Proceedings of the International Conference on Learning Representations, Toulon, France."},{"key":"ref_43","unstructured":"Van Den A\u00e4ron, O., Nal, K., and Koray, K. (2016, January 19\u201324). Pixel recurrent neural networks. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_44","unstructured":"Van Den A\u00e4ron, O., Nal, K., Oriol, V., Lasse, E., Alex, G., and Koray, K. (2016, January 5\u201310). Conditional image generation with PixelCNN decoders. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Tang, H., Wang, W., Xu, D., Yan, Y., and Sebe, N. (2018, January 22\u201326). Gesturegan for Hand Gesture-to-Gesture Translation in the Wild. Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Korea.","DOI":"10.1145\/3240508.3240704"},{"key":"ref_46","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 14\u201316). Very deep convolutional networks for large-scale image recognition. Proceedings of the International Conference on Learning Representations, Banff, AB, Canada."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective search for object recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"2189","DOI":"10.1109\/TPAMI.2012.28","article-title":"Measuring the objectness of image windows","volume":"34","author":"Alexe","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Neubeck, A., and Van Gool, L. (2006, January 20\u201324). Efficient non-maximum suppression. Proceedings of the International Conference on Pattern Recognition, Hong Kong, China.","DOI":"10.1109\/ICPR.2006.479"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/1\/192\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:46:31Z","timestamp":1760190391000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/1\/192"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,29]]},"references-count":50,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2020,1]]}},"alternative-id":["s20010192"],"URL":"https:\/\/doi.org\/10.3390\/s20010192","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12,29]]}}}