{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T01:24:52Z","timestamp":1776216292248,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2020,5,16]],"date-time":"2020-05-16T00:00:00Z","timestamp":1589587200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The estimation of human hand pose has become the basis for many vital applications where the user depends mainly on the hand pose as a system input. Virtual reality (VR) headset, shadow dexterous hand and in-air signature verification are a few examples of applications that require to track the hand movements in real-time. The state-of-the-art 3D hand pose estimation methods are based on the Convolutional Neural Network (CNN). These methods are implemented on Graphics Processing Units (GPUs) mainly due to their extensive computational requirements. However, GPUs are not suitable for the practical application scenarios, where the low power consumption is crucial. Furthermore, the difficulty of embedding a bulky GPU into a small device prevents the portability of such applications on mobile devices. The goal of this work is to provide an energy efficient solution for an existing depth camera based hand pose estimation algorithm. First, we compress the deep neural network model by applying the dynamic quantization techniques on different layers to achieve maximum compression without compromising accuracy. Afterwards, we design a custom hardware architecture. For our device we selected the FPGA as a target platform because FPGAs provide high energy efficiency and can be integrated in portable devices. Our solution implemented on Xilinx UltraScale+ MPSoC FPGA is 4.2\u00d7 faster and 577.3\u00d7 more energy efficient than the original implementation of the hand pose estimation algorithm on NVIDIA GeForce GTX 1070.<\/jats:p>","DOI":"10.3390\/s20102828","type":"journal-article","created":{"date-parts":[[2020,5,18]],"date-time":"2020-05-18T02:43:42Z","timestamp":1589769822000},"page":"2828","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Real-Time Energy Efficient Hand Pose Estimation: A Case Study"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8349-332X","authenticated-orcid":false,"given":"Mhd Rashed","family":"Al Koutayni","sequence":"first","affiliation":[{"name":"Microelectronic Systems Design Research Group, Department of Electrical and Computer Engineering, Technische Universit\u00e4t Kaiserslautern, 67663 Kaiserslautern, Germany"},{"name":"German Research Center for Artificial Intelligence, DFKI, 67663 Kaiserslautern, Germany"},{"name":"Department of Informatics, Technische Universit\u00e4t Kaiserslautern, 67663 Kaiserslautern, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vladimir","family":"Rybalkin","sequence":"additional","affiliation":[{"name":"Microelectronic Systems Design Research Group, Department of Electrical and Computer Engineering, Technische Universit\u00e4t Kaiserslautern, 67663 Kaiserslautern, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jameel","family":"Malik","sequence":"additional","affiliation":[{"name":"German Research Center for Artificial Intelligence, DFKI, 67663 Kaiserslautern, Germany"},{"name":"Department of Informatics, Technische Universit\u00e4t Kaiserslautern, 67663 Kaiserslautern, Germany"},{"name":"School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ahmed","family":"Elhayek","sequence":"additional","affiliation":[{"name":"German Research Center for Artificial Intelligence, DFKI, 67663 Kaiserslautern, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christian","family":"Weis","sequence":"additional","affiliation":[{"name":"Microelectronic Systems Design Research Group, Department of Electrical and Computer Engineering, Technische Universit\u00e4t Kaiserslautern, 67663 Kaiserslautern, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gerd","family":"Reis","sequence":"additional","affiliation":[{"name":"German Research Center for Artificial Intelligence, DFKI, 67663 Kaiserslautern, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Norbert","family":"Wehn","sequence":"additional","affiliation":[{"name":"Microelectronic Systems Design Research Group, Department of Electrical and Computer Engineering, Technische Universit\u00e4t Kaiserslautern, 67663 Kaiserslautern, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Didier","family":"Stricker","sequence":"additional","affiliation":[{"name":"German Research Center for Artificial Intelligence, DFKI, 67663 Kaiserslautern, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,5,16]]},"reference":[{"key":"ref_1","unstructured":"Chen, Y., Tu, Z., Ge, L., Zhang, D., Chen, R., and Yuan, J. (November, January 27). So-handnet: Self-organizing network for 3d hand pose estimation with semi-supervised learning. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Li, S., and Lee, D. (2019, January 16\u201320). Point-to-pose voting based hand pose estimation using residual permutation equivariant layer. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01220"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Zimmermann, C., and Brox, T. (2017, January 22\u201329). Learning to estimate 3d hand pose from single rgb images. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.525"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wang, R., Paris, S., and Popovi\u0107, J. (2011, January 16\u201319). 6D hands: Markerless hand-tracking for computer aided design. Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA.","DOI":"10.1145\/2047196.2047269"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Li, S., Ma, X., Liang, H., G\u00f6rner, M., Ruppel, P., Fang, B., Sun, F., and Zhang, J. (2018). Vision-based teleoperation of shadow dexterous hand using end-to-end deep neural network. arXiv.","DOI":"10.1109\/ICRA.2019.8794277"},{"key":"ref_6","unstructured":"Isaacs, J., and Foo, S. (2004, January 14\u201316). Hand pose estimation for American sign language recognition. Proceedings of the Thirty-Sixth Southeastern Symposium on System Theory, Atlanta, GA, USA."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Malik, J., Elhayek, A., Ahmed, S., Shafait, F., Malik, M., and Stricker, D. (2018). 3DAirSig: A Framework for Enabling In-Air Signatures Using a Multi-Modal Depth Sensor. Sensors, 18.","DOI":"10.3390\/s18113872"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Yuan, S., Ye, Q., Stenger, B., Jain, S., and Kim, T.K. (2017, January 21\u201326). Bighand2. 2m benchmark: Hand pose dataset and state of the art analysis. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.279"},{"key":"ref_9","unstructured":"Vansteenkiste, E. (2016). New FPGA Design Tools and Architectures. [Ph.D. Thesis, Ghent University]."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Malik, J., Elhayek, A., and Stricker, D. (2017, January 10\u201312). Simultaneous Hand Pose and Skeleton Bone-Lengths Estimation from a Single Depth Image. Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China.","DOI":"10.1109\/3DV.2017.00069"},{"key":"ref_11","unstructured":"Oberweger, M., Wohlhart, P., and Lepetit, V. (2015). Hands deep in deep learning for hand pose estimation. arXiv."},{"key":"ref_12","unstructured":"Krishnamoorthi, R. (2018). Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv."},{"key":"ref_13","first-page":"6869","article-title":"Quantized neural networks: Training neural networks with low precision weights and activations","volume":"18","author":"Hubara","year":"2017","journal-title":"J. Mach. Learn. Res."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Venieris, S.I., Kouris, A., and Bouganis, C.S. (2018). Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions. arXiv.","DOI":"10.1109\/FPL.2018.00072"},{"key":"ref_15","unstructured":"Liu, Z., Dou, Y., Jiang, J., and Xu, J. (2016, January 7\u20139). Automatic code generation of convolutional neural networks in FPGA implementation. Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT), Xi\u2019an, China."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1109\/LES.2017.2743247","article-title":"Tactics to directly map CNN graphs on embedded FPGAs","volume":"9","author":"Abdelouahab","year":"2017","journal-title":"IEEE Embed. Syst. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Abdelouahab, K., Bourrasset, C., Pelcat, M., Berry, F., Quinton, J.C., and Serot, J. (2016, January 12\u201315). A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA. Proceedings of the 10th International Conference on Distributed Smart Camera, Paris, France.","DOI":"10.1145\/2967413.2967430"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Wang, Y., Xu, J., Han, Y., Li, H., and Li, X. (2016, January 5\u20139). DeepBurning: Automatic generation of FPGA-based learning accelerators for the neural network family. Proceedings of the 53rd Annual Design Automation Conference, Austin, TX, USA.","DOI":"10.1145\/2897937.2898003"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Venieris, S.I., and Bouganis, C.S. (2017, January 4\u20138). Latency-driven design for FPGA-based convolutional neural networks. Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium.","DOI":"10.23919\/FPL.2017.8056828"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Venieris, S.I., and Bouganis, C.S. (2017). fpgaConvNet: A toolflow for mapping diverse convolutional neural networks on embedded FPGAs. arXiv.","DOI":"10.1145\/3020078.3021791"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Venieris, S.I., and Bouganis, C.S. (2016, January 1\u20133). fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs. Proceedings of the 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Washington, DC, USA.","DOI":"10.1109\/FCCM.2016.22"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P., Jahre, M., and Vissers, K. (2017, January 22\u201324). Finn: A framework for fast, scalable binarized neural network inference. Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/3020078.3021744"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Mamalet, F., and Garcia, C. (2012). Simplifying convnets for fast learning. International Conference on Artificial Neural Networks, Springer.","DOI":"10.1007\/978-3-642-33266-1_8"},{"key":"ref_24","unstructured":"Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2017, January 9). Automatic Differentiation in PyTorch. Proceedings of the NIPS Autodiff Workshop, Long Beach, CA, USA."},{"key":"ref_25","unstructured":"Zhou, X., Wan, Q., Zhang, W., Xue, X., and Wei, Y. (2016). Model-based deep hand pose estimation. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1145\/2629500","article-title":"Real-time continuous pose recovery of human hands using convolutional networks","volume":"33","author":"Tompson","year":"2014","journal-title":"ACM Trans. Graph. (ToG)"},{"key":"ref_27","unstructured":"Miyashita, D., Lee, E.H., and Murmann, B. (2016). Convolutional neural networks using logarithmic data representation. arXiv."},{"key":"ref_28","unstructured":"Matai, J., Richmond, D., Lee, D., and Kastner, R. (2014). Enabling FPGAs for the masses. arXiv."},{"key":"ref_29","unstructured":"Vallina, F.M. (2012). Implementing Memory Structures for Video Processing in the Vivado HLS Tool, Xilinx, Inc.. XAPP793 (v1. 0), 20 September."},{"key":"ref_30","unstructured":"Xilinx (2019). Vivado Design Suite User Guide: High-Level Synthesis (UG902), Xilinx, Inc."},{"key":"ref_31","unstructured":"Xilinx (2018). ZCU102 Evaluation Board (UG1182), Xilinx, Inc."},{"key":"ref_32","unstructured":"(2020, May 07). ONNX Runtime. Available online: https:\/\/github.com\/microsoft\/onnxruntime."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Oberweger, M., and Lepetit, V. (2017, January 22\u201329). Deepprior++: Improving fast and accurate 3d hand pose estimation. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.75"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Malik, J., Elhayek, A., Nunnari, F., Varanasi, K., Tamaddon, K., Heloir, A., and Stricker, D. (2018, January 5\u20138). Deephps: End-to-end estimation of 3d hand pose and shape by learning from synthetic depth. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00023"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_37","unstructured":"Chidananda, P., Sinha, A., Rao, A., Lee, D., and Rabinovich, A. (2019). Efficient 2.5 D Hand Pose Estimation via Auxiliary Multi-Task Training for Embedded Devices. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/10\/2828\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:29:23Z","timestamp":1760174963000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/10\/2828"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,16]]},"references-count":37,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2020,5]]}},"alternative-id":["s20102828"],"URL":"https:\/\/doi.org\/10.3390\/s20102828","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,16]]}}}