{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T23:35:56Z","timestamp":1772580956360,"version":"3.50.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2020,11,27]],"date-time":"2020-11-27T00:00:00Z","timestamp":1606435200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","award":["RTI2018-098694-B-I00"],"award-info":[{"award-number":["RTI2018-098694-B-I00"]}]},{"DOI":"10.13039\/100011199","name":"European Research Council","doi-asserted-by":"publisher","award":["770784, 772738"],"award-info":[{"award-number":["770784, 772738"]}],"id":[{"id":"10.13039\/100011199","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2020,12,31]]},"abstract":"<jats:p>\n            Tracking and reconstructing the 3D pose and geometry of two hands in interaction is a challenging problem that has a high relevance for several human-computer interaction applications, including AR\/VR, robotics, or sign language recognition. Existing works are either limited to simpler tracking settings (\n            <jats:italic>e.g.<\/jats:italic>\n            , considering only a single hand or two spatially separated hands), or rely on less ubiquitous sensors, such as depth cameras. In contrast, in this work we present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera that explicitly considers close interactions. In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN that regresses multiple complementary pieces of information, including segmentation, dense matchings to a 3D hand model, and 2D keypoint positions, together with newly proposed intra-hand relative depth and inter-hand distance maps. These predictions are subsequently used in a generative model fitting framework in order to estimate pose and shape parameters of a 3D hand model for both hands. We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline through an extensive ablation study. Moreover, we demonstrate that our approach offers previously unseen two-hand tracking performance from RGB, and quantitatively and qualitatively outperforms existing RGB-based methods that were not explicitly designed for two-hand interactions. Moreover, our method even performs on-par with depth-based real-time methods.\n          <\/jats:p>","DOI":"10.1145\/3414685.3417852","type":"journal-article","created":{"date-parts":[[2020,11,27]],"date-time":"2020-11-27T21:51:05Z","timestamp":1606513865000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":74,"title":["RGB2Hands"],"prefix":"10.1145","volume":"39","author":[{"given":"Jiayi","family":"Wang","sequence":"first","affiliation":[{"name":"MPI Informatics"}]},{"given":"Franziska","family":"Mueller","sequence":"additional","affiliation":[{"name":"MPI Informatics"}]},{"given":"Florian","family":"Bernard","sequence":"additional","affiliation":[{"name":"Technical University of Munich"}]},{"given":"Suzanne","family":"Sorli","sequence":"additional","affiliation":[{"name":"Universidad Rey Juan Carlos"}]},{"given":"Oleksandr","family":"Sotnychenko","sequence":"additional","affiliation":[{"name":"MPI Informatics"}]},{"given":"Neng","family":"Qian","sequence":"additional","affiliation":[{"name":"MPI Informatics"}]},{"given":"Miguel A.","family":"Otaduy","sequence":"additional","affiliation":[{"name":"Universidad Rey Juan Carlos"}]},{"given":"Dan","family":"Casas","sequence":"additional","affiliation":[{"name":"Universidad Rey Juan Carlos"}]},{"given":"Christian","family":"Theobalt","sequence":"additional","affiliation":[{"name":"MPI Informatics"}]}],"member":"320","published-online":{"date-parts":[[2020,11,27]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"Augmented Skeleton Space Transfer for Depth-Based Hand Pose Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Baek Seungryul","year":"2018","unstructured":"Seungryul Baek , Kwang In Kim , and Tae-Kyun Kim . 2018 . Augmented Skeleton Space Transfer for Depth-Based Hand Pose Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Seungryul Baek, Kwang In Kim, and Tae-Kyun Kim. 2018. Augmented Skeleton Space Transfer for Depth-Based Hand Pose Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00116"},{"key":"e_1_2_2_3_1","volume-title":"International journal of computer vision 35, 1","author":"Belhumeur Peter N","year":"1999","unstructured":"Peter N Belhumeur , David J Kriegman , and Alan L Yuille . 1999. The bas-relief ambiguity . International journal of computer vision 35, 1 ( 1999 ), 33--44. Peter N Belhumeur, David J Kriegman, and Alan L Yuille. 1999. The bas-relief ambiguity. International journal of computer vision 35, 1 (1999), 33--44."},{"key":"e_1_2_2_4_1","volume-title":"The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Boukhayma Adnane","unstructured":"Adnane Boukhayma , Rodrigo de Bem , and Philip H.S. Torr . 2019. 3D Hand Shape and Pose From Images in the Wild . In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Adnane Boukhayma, Rodrigo de Bem, and Philip H.S. Torr. 2019. 3D Hand Shape and Pose From Images in the Wild. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_5_1","volume-title":"Multigrid multidimensional scaling. Numerical linear algebra with applications 13, 2--3","author":"Bronstein Michael M","year":"2006","unstructured":"Michael M Bronstein , Alexander M Bronstein , Ron Kimmel , and Irad Yavneh . 2006. Multigrid multidimensional scaling. Numerical linear algebra with applications 13, 2--3 ( 2006 ), 149--171. Michael M Bronstein, Alexander M Bronstein, Ron Kimmel, and Irad Yavneh. 2006. Multigrid multidimensional scaling. Numerical linear algebra with applications 13, 2--3 (2006), 149--171."},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_41"},{"key":"e_1_2_2_7_1","doi-asserted-by":"crossref","unstructured":"Zhe Cao Gines Hidalgo Tomas Simon Shih-En Wei and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. In arXiv preprint arXiv:1812.08008.  Zhe Cao Gines Hidalgo Tomas Simon Shih-En Wei and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. In arXiv preprint arXiv:1812.08008.","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00706"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00878"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01109"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3311970"},{"key":"e_1_2_2_12_1","doi-asserted-by":"crossref","unstructured":"Yana Hasson G\u00fcl Varol Dimitrios Tzionas Igor Kalevatykh Michael J. Black Ivan Laptev and Cordelia Schmid. 2019. Learning joint reconstruction of hands and manipulated objects. In CVPR.  Yana Hasson G\u00fcl Varol Dimitrios Tzionas Igor Kalevatykh Michael J. Black Ivan Laptev and Cordelia Schmid. 2019. Learning joint reconstruction of hands and manipulated objects. In CVPR.","DOI":"10.1109\/CVPR.2019.01208"},{"key":"e_1_2_2_13_1","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV). 118--134","author":"Iqbal Umar","year":"2018","unstructured":"Umar Iqbal , Pavlo Molchanov , Thomas Breuel Juergen Gall , and Jan Kautz . 2018 . Hand pose estimation via latent 2.5 d heatmap regression . In Proceedings of the European Conference on Computer Vision (ECCV). 118--134 . Umar Iqbal, Pavlo Molchanov, Thomas Breuel Juergen Gall, and Jan Kautz. 2018. Hand pose estimation via latent 2.5 d heatmap regression. In Proceedings of the European Conference on Computer Vision (ECCV). 118--134."},{"key":"e_1_2_2_14_1","volume-title":"Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh.","author":"Joo Hanbyul","year":"2017","unstructured":"Hanbyul Joo , Tomas Simon , Xulong Li , Hao Liu , Lei Tan , Lin Gui , Sean Banerjee , Timothy Scott Godisart , Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2017 . Panoptic Studio : A Massively Multiview System for Social Interaction Capture. IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2017). Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin Gui, Sean Banerjee, Timothy Scott Godisart, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2017. Panoptic Studio: A Massively Multiview System for Social Interaction Capture. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.438"},{"key":"e_1_2_2_16_1","volume-title":"Point-To-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Li Shile","year":"2019","unstructured":"Shile Li and Dongheui Lee . 2019 . Point-To-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Shile Li and Dongheui Lee. 2019. Point-To-Pose Voting Based Hand Pose Estimation Using Residual Permutation Equivariant Layer. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/2532129.2532141"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00013"},{"key":"e_1_2_2_19_1","first-page":"49","article-title":"Real-time pose and shape reconstruction of two interacting hands with a single depth camera. ACM Transactions on Graphics (Proc","volume":"38","author":"Mueller Franziska","year":"2019","unstructured":"Franziska Mueller , Micah Davis , Florian Bernard , Oleksandr Sotnychenko , Mickeal Verschoor , Miguel A Otaduy , Dan Casas , and Christian Theobalt . 2019 . Real-time pose and shape reconstruction of two interacting hands with a single depth camera. ACM Transactions on Graphics (Proc . SIGGRAPH) 38 , 4 (2019), 49 . Franziska Mueller, Micah Davis, Florian Bernard, Oleksandr Sotnychenko, Mickeal Verschoor, Miguel A Otaduy, Dan Casas, and Christian Theobalt. 2019. Real-time pose and shape reconstruction of two interacting hands with a single depth camera. ACM Transactions on Graphics (Proc. SIGGRAPH) 38, 4 (2019), 49.","journal-title":"SIGGRAPH)"},{"key":"e_1_2_2_20_1","volume-title":"International Conference on Computer Vision (ICCV).","author":"Mueller Franziska","year":"2017","unstructured":"Franziska Mueller , Dushyant Mehta , Oleksandr Sotnychenko , Srinath Sridhar , Dan Casas , and Christian Theobalt . 2017 . Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor . In International Conference on Computer Vision (ICCV). Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor. In International Conference on Computer Vision (ICCV)."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2919332.2919832"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354910"},{"key":"e_1_2_2_23_1","doi-asserted-by":"crossref","unstructured":"Paschalis Panteleris Nikolaos Kyriazis and Antonis A Argyros. 2015. 3D Tracking of Human Hands in Interaction with Unknown Objects.. In BMVC. 123--1.  Paschalis Panteleris Nikolaos Kyriazis and Antonis A Argyros. 2015. 3D Tracking of Human Hands in Interaction with Unknown Objects.. In BMVC. 123--1.","DOI":"10.5244\/C.29.123"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00054"},{"key":"e_1_2_2_25_1","volume-title":"Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).","author":"Pavlakos Georgios","unstructured":"Georgios Pavlakos , Vasileios Choutas , Nima Ghorbani , Timo Bolkart , Ahmed A. A. Osman , Dimitrios Tzionas , and Michael J. Black . 2019. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image . In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_26_1","volume-title":"Workshop at the European Conference on Computer Vision. Springer, 356--371","author":"Rogez Gr\u00e9gory","year":"2014","unstructured":"Gr\u00e9gory Rogez , Maryam Khademi , JS Supan\u010di\u010d III, Jose Maria Martinez Montiel , and Deva Ramanan . 2014 . 3D hand pose detection in egocentric RGB-D images . In Workshop at the European Conference on Computer Vision. Springer, 356--371 . Gr\u00e9gory Rogez, Maryam Khademi, JS Supan\u010di\u010d III, Jose Maria Martinez Montiel, and Deva Ramanan. 2014. 3D hand pose detection in egocentric RGB-D images. In Workshop at the European Conference on Computer Vision. Springer, 356--371."},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130883"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.494"},{"key":"e_1_2_2_30_1","volume-title":"Cross-Modal Deep Variational Hand Pose Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Spurr Adrian","year":"2018","unstructured":"Adrian Spurr , Jie Song , Seonwook Park , and Otmar Hilliges . 2018 . Cross-Modal Deep Variational Hand Pose Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Adrian Spurr, Jie Song, Seonwook Park, and Otmar Hilliges. 2018. Cross-Modal Deep Variational Hand Pose Estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298941"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_19"},{"key":"e_1_2_2_33_1","volume-title":"Robust Articulated-ICP for Real-Time Hand Tracking. Computer Graphics Forum (Symposium on Geometry Processing) 34, 5","author":"Tagliasacchi Andrea","year":"2015","unstructured":"Andrea Tagliasacchi , Matthias Schroeder , Anastasia Tkach , Sofien Bouaziz , Mario Botsch , and Mark Pauly . 2015. Robust Articulated-ICP for Real-Time Hand Tracking. Computer Graphics Forum (Symposium on Geometry Processing) 34, 5 ( 2015 ). Andrea Tagliasacchi, Matthias Schroeder, Anastasia Tkach, Sofien Bouaziz, Mario Botsch, and Mark Pauly. 2015. Robust Articulated-ICP for Real-Time Hand Tracking. Computer Graphics Forum (Symposium on Geometry Processing) 34, 5 (2015)."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.605"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.380"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925965"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130853"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00464"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2980179.2980226"},{"key":"e_1_2_2_40_1","volume-title":"Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Transactions on Graphics 33 (August","author":"Tompson Jonathan","year":"2014","unstructured":"Jonathan Tompson , Murphy Stein , Yann Lecun , and Ken Perlin . 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Transactions on Graphics 33 (August 2014 ). Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Transactions on Graphics 33 (August 2014)."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0895-4"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.90"},{"key":"e_1_2_2_43_1","volume-title":"Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022","author":"Ulyanov Dmitry","year":"2016","unstructured":"Dmitry Ulyanov , Andrea Vedaldi , and Victor Lempitsky . 2016. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 ( 2016 ). Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2016. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)."},{"key":"e_1_2_2_44_1","volume-title":"Soft Hand Simulation for Smooth and Robust Natural Interaction. In IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 183--190","author":"Verschoor Mickeal","year":"2018","unstructured":"Mickeal Verschoor , Daniel Lobo , and Miguel A Otaduy . 2018 . Soft Hand Simulation for Smooth and Robust Natural Interaction. In IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 183--190 . Mickeal Verschoor, Daniel Lobo, and Miguel A Otaduy. 2018. Soft Hand Simulation for Smooth and Robust Natural Interaction. In IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 183--190."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.132"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01122"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00242"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00279"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00244"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508363.2508412"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.525"},{"key":"e_1_2_2_52_1","volume-title":"FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images. In The IEEE International Conference on Computer Vision (ICCV).","author":"Zimmermann Christian","year":"2019","unstructured":"Christian Zimmermann , Duygu Ceylan , Jimei Yang , Bryan Russell , Max Argus , and Thomas Brox . 2019 . FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images. In The IEEE International Conference on Computer Vision (ICCV). Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, and Thomas Brox. 2019. FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images. In The IEEE International Conference on Computer Vision (ICCV)."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3414685.3417852","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3414685.3417852","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:03:15Z","timestamp":1750197795000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3414685.3417852"}},"subtitle":["real-time tracking of 3D hand interactions from monocular RGB video"],"short-title":[],"issued":{"date-parts":[[2020,11,27]]},"references-count":52,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2020,12,31]]}},"alternative-id":["10.1145\/3414685.3417852"],"URL":"https:\/\/doi.org\/10.1145\/3414685.3417852","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,27]]},"assertion":[{"value":"2020-11-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}