{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:41:11Z","timestamp":1760240471994,"version":"build-2065373602"},"reference-count":47,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2019,7,1]],"date-time":"2019-07-01T00:00:00Z","timestamp":1561939200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61876170; 61503349; 61603357"],"award-info":[{"award-number":["61876170; 61503349; 61603357"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Hand pose estimation is a critical technology of computer vision and human-computer interaction. Deep-learning methods require a considerable amount of tagged data. Accordingly, numerous labeled training data are required. This paper aims to generate depth hand images. Given a ground-truth 3D hand pose, the developed method can generate depth hand images. To be specific, a ground truth can be 3D hand poses with the hand structure contained, while the synthesized image has an identical size to that of the training image and a similar visual appearance to the training set. The developed method, inspired by the progress in the generative adversarial network (GAN) and image-style transfer, helps model the latent statistical relationship between the ground-truth hand pose and the corresponding depth hand image. The images synthesized using the developed method are demonstrated to be feasible for enhancing performance. On public hand pose datasets (NYU, MSRA, ICVL), comprehensive experiments prove that the developed method outperforms the existing works.<\/jats:p>","DOI":"10.3390\/s19132919","type":"journal-article","created":{"date-parts":[[2019,7,1]],"date-time":"2019-07-01T10:54:47Z","timestamp":1561978487000},"page":"2919","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Synthesizing Depth Hand Images with GANs and Style Transfer for Hand Pose Estimation"],"prefix":"10.3390","volume":"19","author":[{"given":"Wangyong","family":"He","sequence":"first","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5754-3191","authenticated-orcid":false,"given":"Zhongzhao","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongbo","family":"Li","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xinmei","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wendi","family":"Cai","sequence":"additional","affiliation":[{"name":"School of Automation, China University of Geosciences, Wuhan 430074, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,7,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"911","DOI":"10.1109\/TNSRE.2018.2814826","article-title":"Toward Optimization of Gaze-Controlled Human-Computer Interaction: Application to Hindi Virtual Keyboard for Stroke Patients","volume":"26","author":"Meena","year":"2018","journal-title":"IEEE Trans. Neural Syst. Rehabil. Eng."},{"key":"ref_2","unstructured":"Preece, J. (1994). Human-Computer Interaction, Addison-Wesley Longman Ltd."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Supancic, J.S., Rogez, G., Yang, Y., Shotton, J., and Ramanan, D. (2015, January 13\u201316). Depth-Based Hand Pose Estimation: Data, Methods, and Challenges. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.217"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2629500","article-title":"Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks","volume":"33","author":"Tompson","year":"2014","journal-title":"ACM Trans. Graph."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Tang, D., Taylor, J., Kohli, P., Keskin, C., Kim, T., and Shotton, J. (2015, January 13\u201316). Opening the Black Box: Hierarchical Sampling Optimization for Estimating Human Hand Pose. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.380"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ge, L., Liang, H., Yuan, J., and Thalmann, D. (2017, January 21\u201326). 3D Convolutional Neural Networks for Efficient and Robust Hand Pose Estimation from Single Depth Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.602"},{"key":"ref_7","first-page":"1","article-title":"The Improvement of DS Evidence Theory and Its Application in IR\/MMW Target Recognition","volume":"2016","author":"Li","year":"2016","journal-title":"J. Sensors"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Li, J., Qiu, T., Wen, C., Xie, K., and Wen, F.-Q. (2018). Robust Face Recognition Using the Deep C2D-CNN Model Based on Decision-Level Fusion. J. Sensors, 18.","DOI":"10.3390\/s18072080"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 8\u201310). Going Deeper with Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_10","unstructured":"Deng, X., Yang, S., Zhang, Y., Tan, P., Chang, L., and Wang, H. (2017, April 07). Hand3D: Hand Pose Estimation Using 3D Neural Network. Available online: https:\/\/arxiv.org\/pdf\/1704.02224.pdf."},{"key":"ref_11","unstructured":"Zhou, X., Wan, Q., Zhang, W., Xue, X., and Wei, Y. (2016, June 22). Model Based Deep Hand Pose Estimation. Available online: https:\/\/arxiv.org\/pdf\/1606.06854.pdf."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1016\/j.neucom.2017.04.014","article-title":"Multi-Task, MultiDomain Learning: Application to Semantic Segmentation and Pose Regression","volume":"1","author":"Fourure","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.cviu.2017.10.006","article-title":"Hand Pose Estimation through Semi-Supervised and WeaklySupervised Learning","volume":"164","author":"Neverova","year":"2017","journal-title":"Comput. Vision Image Understanding"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Xu, C., Govindarajan, L.N., Zhang, Y., and Cheng, L. (2017). LieX: Depth Image Based Articulated Object Pose Estimation, Tracking, and Action Recognition on Lie Groups. Int. J. Comput. Vision, 454\u2013478.","DOI":"10.1007\/s11263-017-0998-6"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1016\/j.jvcir.2018.04.005","article-title":"Region Ensemble Network: Towards Good Practices for Deep 3D Hand Pose Estimation","volume":"55","author":"Wang","year":"2018","journal-title":"J. Vision Commun. Image R"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Oberweger, M., and Lepetit, V. (2017, January 22\u201329). DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.75"},{"key":"ref_17","unstructured":"Chen, X., Wang, G., Guo, H., and Zhang, C. (2018, June 24). Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation. Available online: https:\/\/arxiv.org\/pdf\/1708.03416.pdf."},{"key":"ref_18","unstructured":"Yang, H., and Zhang, J. (2016, January 20\u201324). Hand Pose Regression via a Classification-Guided Approach. Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Sun, X., Wei, Y., Liang, S., Tang, X., and Sun, J. (2015, January 8\u201310). Cascaded hand pose regression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298683"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Tang, D., Jin Chang, H., Tejani, A., and Kim, T.-K. (2014, January 24\u201327). Latent regression forest: Structured estimation of 3d articulated hand posture. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.490"},{"key":"ref_21","unstructured":"Kingma, D., and Welling, M. (2014, January 14\u201316). Auto-encoding variational bayes. Proceedings of the International Conference on Learning Representations, Banff, AB, Canada."},{"key":"ref_22","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014, January 8\u201313). Generative adversarial nets. Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_23","unstructured":"Oord, A., Kalchbrenner, N., and Kavukcuoglu, K. (2016, January 19\u201324). Pixel recurrent neural networks. Proceedings of the International Conference on Machine Learning, New York, NY, America."},{"key":"ref_24","unstructured":"Radford, A., Metz, L., and Chintala, S. (2016, January 07). Unsupervised representation learning with deep convolutional generative adversarial networks. Available online: https:\/\/arxiv.org\/pdf\/1511.06434.pdf."},{"key":"ref_25","unstructured":"Mirza, M., and Osindero, S. (2014, November 06). Conditional generative adversarial nets. Available online: https:\/\/arxiv.org\/pdf\/1411.1784.pdf."},{"key":"ref_26","unstructured":"Denton, E., Chintala, S., and Fergus, R. (2015, January 7\u201312). Deep generative image models using a Laplacian pyramid of adversarial networks. Proceedings of the Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_27","unstructured":"Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., and Abbeel, P. (2016, January 5\u201310). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_28","unstructured":"Arjovsky, M., Chintala, S., and Bottou, L. (2017, December 06). Wasserstein GAN. Available online: https:\/\/arxiv.org\/pdf\/1701.07875.pdf."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Hertzmann, A., Jacobs, C., Oliver, N., Curless, B., and Salesin, D. (2001, January 26\u201330). Image analogies. Proceedings of the Conference on Computer Graphics and Interactive Techniques, New York, NY, USA.","DOI":"10.1145\/383259.383295"},{"key":"ref_30","unstructured":"Cheng, L., Vishwanathan, S., and Zhang, X. (2008, January 24\u201326). Consistent image analogies using semi-supervised learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AL, USA."},{"key":"ref_31","unstructured":"Gatys, L., Ecker, A., and Bethge, M. (2015, September 02). A neural algorithm of artistic style. Available online: https:\/\/arxiv.org\/pdf\/1508.06576.pdf."},{"key":"ref_32","unstructured":"Ulyanov, D., Lebedev, V., Vedaldi, A., and Lempitsky, V. (2016, January 19\u201324). Texture networks: Feed-forward synthesis of textures and stylized images. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Johnson, J., Alahi, A., and Li, F. (2016, January 8\u201316). Perceptual losses for real-time style transfer and super-resolution. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_43"},{"key":"ref_34","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Ciregan, D., Meier, U., and Schmidhuber, J. (2012, January 16\u201321). Multi-column deep neural networks for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, UAS.","DOI":"10.1109\/CVPR.2012.6248110"},{"key":"ref_36","unstructured":"Ge, L., Liang, H., Yuan, J., and Thalmann, D. (July, January 26). Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Guo, H., Wang, G., Chen, X., Zhang, C., Qiao, F., and Yang, H. (2017, January 17\u201320). Region ensemble network: Improving convolutional network for hand pose estimation. Proceedings of the IEEE International Conference on Image Processing, Beijing, China.","DOI":"10.1109\/ICIP.2017.8297136"},{"key":"ref_38","unstructured":"Oberweger, M., Wohlhart, P., and Lepetit, V. (2015, January 9\u201311). Hands deep in deep learning for hand pose estimation. Proceedings of the Computer Vision Winter Workshop, Styria, Austria."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Tang, D., Yu, T.-H., and Kim, T.-K. (2013, January 3\u20136). Real-time articulated hand pose estimation using semi-supervised transductive regression forests. Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.400"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1241","DOI":"10.1109\/TMM.2014.2306177","article-title":"Parsing the hand in depth images","volume":"16","author":"Liang","year":"2014","journal-title":"IEEE Trans. Multimed."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wan, C., Probst, T., Van Gool, L., and Yao, A. (2017, January 22\u201325). Crossing nets: Dual generative models with a shared latent space for hand pose estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.132"},{"key":"ref_42","unstructured":"Bouchacourt, D., Kumar, M.P., and Nowozin, S. (2016, January 5\u201310). DISCO Nets: Dissimilarity Coefficient Networks. Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Baek, S., Kim, K.I., and Kim, T.K. (2018, May 11). Augmented skeleton space transfer for depth-based hand pose estimation. Available online: https:\/\/arxiv.org\/pdf\/1805.04497v1.pdf.","DOI":"10.1109\/CVPR.2018.00869"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Oberweger, M., Wohlhart, P., and Lepetit, V. (2015, January 13\u201316). Training a Feedback Loop for Hand Pose Estimation. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.379"},{"key":"ref_45","unstructured":"Simonyan, K., and Zisserman, A. (2015, April 10). Very deep convolutional networks for large-scale image recognition. Available online: https:\/\/arxiv.org\/pdf\/1409.1556.pdf."},{"key":"ref_46","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1016\/j.cviu.2016.11.005","article-title":"Guided Optimisation through Classification and Regression for Hand Pose Estimation","volume":"115","author":"Krejov","year":"2017","journal-title":"Comput. Vision Image Understanding"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/13\/2919\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:01:35Z","timestamp":1760187695000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/13\/2919"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,1]]},"references-count":47,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2019,7]]}},"alternative-id":["s19132919"],"URL":"https:\/\/doi.org\/10.3390\/s19132919","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2019,7,1]]}}}