{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T01:42:19Z","timestamp":1772674939574,"version":"3.50.1"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2017,11,20]],"date-time":"2017-11-20T00:00:00Z","timestamp":1511136000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2017,12,31]]},"abstract":"<jats:p>The state of the art in articulated hand tracking has been greatly advanced by hybrid methods that fit a generative hand model to depth data, leveraging both temporally and discriminatively predicted starting poses. In this paradigm, the generative model is used to define an energy function and a local iterative optimization is performed from these starting poses in order to find a \"good local minimum\" (i.e. a local minimum close to the true pose). Performing this optimization quickly is key to exploring more starting poses, performing more iterations and, crucially, exploiting high frame rates that ensure that temporally predicted starting poses are in the basin of convergence of a good local minimum. At the same time, a detailed and accurate generative model tends to deepen the good local minima and widen their basins of convergence. Recent work, however, has largely had to trade-off such a detailed hand model with one that facilitates such rapid optimization. We present a new implicit model of hand geometry that mostly avoids this compromise and leverage it to build an ultra-fast hybrid hand tracking system. Specifically, we construct an articulated signed distance function that, for any pose, yields a closed form calculation of both the distance to the detailed surface geometry and the necessary derivatives to perform gradient based optimization. There is no need to introduce or update any explicit \"correspondences\" yielding a simple algorithm that maps well to parallel hardware such as GPUs. As a result, our system can run at extremely high frame rates (e.g. up to 1000fps). Furthermore, we demonstrate how to detect, segment and optimize for two strongly interacting hands, recovering complex interactions at extremely high framerates. In the absence of publicly available datasets of sufficiently high frame rate, we leverage a multiview capture system to create a new 180fps dataset of one and two hands interacting together or with objects.<\/jats:p>","DOI":"10.1145\/3130800.3130853","type":"journal-article","created":{"date-parts":[[2017,11,22]],"date-time":"2017-11-22T16:25:08Z","timestamp":1511367908000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":74,"title":["Articulated distance fields for ultra-fast tracking of hands interacting"],"prefix":"10.1145","volume":"36","author":[{"given":"Jonathan","family":"Taylor","sequence":"first","affiliation":[{"name":"perceptiveIO"}]},{"given":"Vladimir","family":"Tankovich","sequence":"additional","affiliation":[{"name":"perceptiveIO"}]},{"given":"Danhang","family":"Tang","sequence":"additional","affiliation":[{"name":"perceptiveIO"}]},{"given":"Cem","family":"Keskin","sequence":"additional","affiliation":[{"name":"perceptiveIO"}]},{"given":"David","family":"Kim","sequence":"additional","affiliation":[{"name":"perceptiveIO"}]},{"given":"Philip","family":"Davidson","sequence":"additional","affiliation":[{"name":"perceptiveIO"}]},{"given":"Adarsh","family":"Kowdle","sequence":"additional","affiliation":[{"name":"perceptiveIO"}]},{"given":"Shahram","family":"Izadi","sequence":"additional","affiliation":[{"name":"perceptiveIO"}]}],"member":"320","published-online":{"date-parts":[[2017,11,20]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33783-3_46"},{"key":"e_1_2_2_2_1","unstructured":"Blender Online Community. 2016. Blender - a 3D modelling and rendering package. Blender Foundation Blender Institute Amsterdam. http:\/\/www.blender.org  Blender Online Community. 2016. Blender - a 3D modelling and rendering package. Blender Foundation Blender Institute Amsterdam. http:\/\/www.blender.org"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.68"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.1000236"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.33"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925969"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298647"},{"key":"e_1_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Sean Ryan Fanello Julien Valentin Adarsh Kowdle Christoph Rhemann Vladimir Tankovich Carlo Ciliberto Philip Davidson and Shahram Izadi. 2017a. Low Compute and Fully Parallel Computer Vision with HashMatch. In ICCV.  Sean Ryan Fanello Julien Valentin Adarsh Kowdle Christoph Rhemann Vladimir Tankovich Carlo Ciliberto Philip Davidson and Shahram Izadi. 2017a. Low Compute and Fully Parallel Computer Vision with HashMatch. In ICCV.","DOI":"10.1109\/ICCV.2017.418"},{"key":"e_1_2_2_9_1","doi-asserted-by":"crossref","unstructured":"Sean Ryan Fanello Julien Valentin Christoph Rhemann Adarsh Kowdle Vladimir Tankovich Philip Davidson and Shahram Izadi. 2017b. UltraStereo: Efficient Learning-based Matching for Active Stereo Systems. In CVPR.  Sean Ryan Fanello Julien Valentin Christoph Rhemann Adarsh Kowdle Vladimir Tankovich Philip Davidson and Shahram Izadi. 2017b. UltraStereo: Efficient Learning-based Matching for Active Stereo Systems. In CVPR.","DOI":"10.1109\/CVPR.2017.692"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2003.09.004"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.391"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126270"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.470"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_22"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2047196.2047270"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.605"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33783-3_61"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298869"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1002\/nme.1296"},{"key":"e_1_2_2_20_1","unstructured":"Jonathan Long Evan Shelhamer and Trevor Darrell. 2014. Fully Convolutional Networks for Semantic Segmentation. CoRR abs\/1411.4038 (2014). http:\/\/arxiv.org\/abs\/1411.4038  Jonathan Long Evan Shelhamer and Trevor Darrell. 2014. Fully Convolutional Networks for Semantic Segmentation. CoRR abs\/1411.4038 (2014). http:\/\/arxiv.org\/abs\/1411.4038"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2532129.2532141"},{"key":"e_1_2_2_22_1","volume-title":"Proceedings of International Conference on Computer Vision (ICCV). 10","author":"Mueller Franziska","year":"2017"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298631"},{"key":"e_1_2_2_24_1","unstructured":"Markus Oberweger Paul Wohlhart and Vincent Lepetit. 2015a. Hands Deep in Deep Learning for Hand Pose Estimation. CoRR abs\/1502.06807 (2015). http:\/\/arxiv.org\/abs\/1502.06807  Markus Oberweger Paul Wohlhart and Vincent Lepetit. 2015a. Hands Deep in Deep Learning for Hand Pose Estimation. CoRR abs\/1502.06807 (2015). http:\/\/arxiv.org\/abs\/1502.06807"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/2919332.2919832"},{"key":"e_1_2_2_26_1","first-page":"3","article-title":"Efficient model-based 3D tracking of hand articulations using Kinect","volume":"1","author":"Oikonomidis Iason","year":"2011","journal-title":"BmVC"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354910"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.145"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.277"},{"key":"e_1_2_2_30_1","volume-title":"DART: Dense Articulated Real-Time Tracking. In Robotics: Science and Systems.","author":"Schmidt Tanner","year":"2014"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702179"},{"key":"e_1_2_2_32_1","doi-asserted-by":"crossref","unstructured":"Jamie Shotton Andrew Fitzgibbon Andrew Blake Alex Kipman Mark Finocchio Bob Moore and Toby Sharp. 2011. Real-Time Human Pose Recognition in Parts from a Single Depth Image. https:\/\/www.microsoft.com\/en-us\/research\/publication\/real-time-human-pose-recognition-in-parts-from-a-single-depth-image\/  Jamie Shotton Andrew Fitzgibbon Andrew Blake Alex Kipman Mark Finocchio Bob Moore and Toby Sharp. 2011. Real-Time Human Pose Recognition in Parts from a Single Depth Image. https:\/\/www.microsoft.com\/en-us\/research\/publication\/real-time-human-pose-recognition-in-parts-from-a-single-depth-image\/","DOI":"10.1109\/CVPR.2011.5995316"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2629697"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.450"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298941"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.305"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298683"},{"key":"e_1_2_2_38_1","volume-title":"Robust Articulated-ICP for Real-Time Hand Tracking. Symposium on Geometry Processing (Computer Graphics Forum)","author":"Tagliasacchi Andrea","year":"2015"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.605"},{"key":"e_1_2_2_40_1","doi-asserted-by":"crossref","unstructured":"Danhang Tang Hyung Jin Chang Alykhan Tejani and Tae-Kyun Kim. 2016. Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture. In Transactions on Pattern Analysis and Machine Intelligence.  Danhang Tang Hyung Jin Chang Alykhan Tejani and Tae-Kyun Kim. 2016. Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture. In Transactions on Pattern Analysis and Machine Intelligence.","DOI":"10.1109\/TPAMI.2016.2599170"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.380"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925965"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.88"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2980179.2980226"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130830"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2629500"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0895-4"},{"key":"e_1_2_2_48_1","unstructured":"Chengde Wan Thomas Probst Luc Van Gool and Angela Yao. 2017. Crossing Nets: Dual Generative Models with a Shared Latent Space for Hand Pose Estimation. arXiv preprint arXiv.1702.03431 (2017).  Chengde Wan Thomas Probst Luc Van Gool and Angela Yao. 2017. Crossing Nets: Dual Generative Models with a Shared Latent Space for Hand Pose Estimation. arXiv preprint arXiv.1702.03431 (2017)."},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_21"},{"key":"e_1_2_2_50_1","doi-asserted-by":"crossref","unstructured":"Shanxin Yuan Qi Ye Bjorn Stenger Siddhand Jain and Tae-Kyun Kim. 2017. BigHand2. 2M Benchmark: Hand Pose Dataset and State of the Art Analysis. arXiv preprint arXiv.1704.02612 (2017).  Shanxin Yuan Qi Ye Bjorn Stenger Siddhand Jain and Tae-Kyun Kim. 2017. BigHand2. 2M Benchmark: Hand Pose Dataset and State of the Art Analysis. arXiv preprint arXiv.1704.02612 (2017).","DOI":"10.1109\/CVPR.2017.279"},{"key":"e_1_2_2_51_1","unstructured":"Xingyi Zhou Qingfu Wan Wei Zhang Xiangyang Xue and Yichen Wei. 2016. Model-based Deep Hand Pose Estimation. In IJCAI.   Xingyi Zhou Qingfu Wan Wei Zhang Xiangyang Xue and Yichen Wei. 2016. Model-based Deep Hand Pose Estimation. In IJCAI."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3130800.3130853","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3130800.3130853","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:11:18Z","timestamp":1750212678000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3130800.3130853"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,11,20]]},"references-count":51,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2017,12,31]]}},"alternative-id":["10.1145\/3130800.3130853"],"URL":"https:\/\/doi.org\/10.1145\/3130800.3130853","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,11,20]]},"assertion":[{"value":"2017-11-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}