{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:38:36Z","timestamp":1775230716013,"version":"3.50.1"},"reference-count":86,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T00:00:00Z","timestamp":1722470400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T00:00:00Z","timestamp":1722470400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100010418","name":"Institute for Information and Communications Technology Promotion","doi-asserted-by":"publisher","award":["2019-0-01270"],"award-info":[{"award-number":["2019-0-01270"]}],"id":[{"id":"10.13039\/501100010418","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010418","name":"Institute for Information and Communications Technology Promotion","doi-asserted-by":"publisher","award":["RS-2023-00228996"],"award-info":[{"award-number":["RS-2023-00228996"]}],"id":[{"id":"10.13039\/501100010418","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006465","name":"Korea Creative Content Agency","doi-asserted-by":"publisher","award":["R2022020028"],"award-info":[{"award-number":["R2022020028"]}],"id":[{"id":"10.13039\/501100006465","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008783","name":"National Research Council of Science and Technology","doi-asserted-by":"publisher","award":["CRC 21011"],"award-info":[{"award-number":["CRC 21011"]}],"id":[{"id":"10.13039\/501100008783","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Virtual Reality"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    We propose a robust 3D hand tracking system in various hand action environments, including hand-object interaction, which utilizes a single color image and a previous pose prediction as input. We observe that existing methods deterministically exploit temporal information in motion space, failing to address realistic diverse hand motions. Also, prior methods paid less attention to efficiency as well as robust performance, i.e., the balance issues between time and accuracy. The Temporally Enhanced Graph Convolutional Network (TE-GCN) utilizes a 2-stage framework to encode temporal information adaptively. The system establishes balance by adopting an adaptive GCN, which effectively learns the spatial dependency between hand mesh vertices. Furthermore, the system leverages the previous prediction by estimating the relevance across image features through the attention mechanism. The proposed method achieves state-of-the-art balanced performance on challenging benchmarks and demonstrates robust results on various hand motions in real scenes. Moreover, the hand tracking system is integrated into a recent HMD with an off-loading framework, achieving a real-time framerate while maintaining high performance. Our study improves the usability of a high-performance hand-tracking method, which can be generalized to other algorithms and contributes to the usage of HMD in everyday life. Our code with the HMD project will be available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/UVR-WJCHO\/TEGCN_on_Hololens2\">https:\/\/github.com\/UVR-WJCHO\/TEGCN_on_Hololens2<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1007\/s10055-024-01039-3","type":"journal-article","created":{"date-parts":[[2024,8,1]],"date-time":"2024-08-01T05:05:27Z","timestamp":1722488727000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Temporally enhanced graph convolutional network for hand tracking from an egocentric camera"],"prefix":"10.1007","volume":"28","author":[{"given":"Woojin","family":"Cho","sequence":"first","affiliation":[]},{"given":"Taewook","family":"Ha","sequence":"additional","affiliation":[]},{"given":"Ikbeom","family":"Jeon","sequence":"additional","affiliation":[]},{"given":"Jinwoo","family":"Jeon","sequence":"additional","affiliation":[]},{"given":"Tae-Kyun","family":"Kim","sequence":"additional","affiliation":[]},{"given":"Woontack","family":"Woo","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,8,1]]},"reference":[{"key":"1039_CR1","doi-asserted-by":"crossref","unstructured":"Armagan A, Garcia-Hernando G, Baek S, Hampali S, Rad M, Zhang Z, Xie S, Chen M, Zhang B, Xiong F et\u00a0al. (2020) Measuring generalisation to unseen viewpoints, articulations, shapes and objects for 3D hand pose estimation under hand-object interaction. In: Computer vision\u2013ECCV 2020: 16th European conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXIII 16, Springer, pp 85\u2013101","DOI":"10.1007\/978-3-030-58592-1_6"},{"key":"1039_CR2","doi-asserted-by":"crossref","unstructured":"Baek S, Kim KI, Kim T-K (2019) Pushing the envelope for RGB-based dense 3D hand pose estimation via neural rendering. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1067\u20131076","DOI":"10.1109\/CVPR.2019.00116"},{"key":"1039_CR3","doi-asserted-by":"crossref","unstructured":"Baek S, Kim KI, Kim T-K (2020) Weakly-supervised domain adaptation via gan and mesh model for estimating 3d hand poses interacting objects. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 6121\u20136131","DOI":"10.1109\/CVPR42600.2020.00616"},{"key":"1039_CR4","doi-asserted-by":"crossref","unstructured":"Boukhayma A, Bem Rd, Torr PH (2019) 3D hand shape and pose from images in the wild. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 10843\u201310852","DOI":"10.1109\/CVPR.2019.01110"},{"key":"1039_CR5","unstructured":"Bruna J, Zaremba W, Szlam A, LeCun Y (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203"},{"key":"1039_CR6","doi-asserted-by":"crossref","unstructured":"Cai Y, Ge L, Liu J, Cai J, Cham T-J, Yuan J, Thalmann NM (2019) Exploiting spatial-temporal relationships for 3D pose estimation via graph convolutional networks. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 2272\u20132281","DOI":"10.1109\/ICCV.2019.00236"},{"key":"1039_CR7","doi-asserted-by":"crossref","unstructured":"Cao Z, Radosavovic I, Kanazawa A, Malik J (2021) Reconstructing hand-object interactions in the wild. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 12417\u201312426","DOI":"10.1109\/ICCV48922.2021.01219"},{"key":"1039_CR8","doi-asserted-by":"crossref","unstructured":"Chao Y-W, Yang W, Xiang Y, Molchanov P, Handa A, Tremblay J, Narang YS, Van\u00a0Wyk K, Iqbal U, Birchfield S et\u00a0al. (2021) Dexycb: a benchmark for capturing hand grasping of objects. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 9044\u20139053","DOI":"10.1109\/CVPR46437.2021.00893"},{"key":"1039_CR9","doi-asserted-by":"crossref","unstructured":"Chen L, Lin S-Y, Xie Y, Lin Y-Y, Xie X (2021a) Temporal-aware self-supervised learning for 3D hand pose and mesh estimation in videos. In: Proceedings of the IEEE\/CVF winter conference on applications of computer vision, pp 1050\u20131059","DOI":"10.1109\/WACV48630.2021.00109"},{"key":"1039_CR10","doi-asserted-by":"crossref","unstructured":"Chen X, Liu Y, Ma C, Chang J, Wang H, Chen T, Guo X, Wan P, Zheng W (2021b) Camera-space hand mesh recovery via semantic aggregation and adaptive 2D-1D registration. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 13274\u201313283","DOI":"10.1109\/CVPR46437.2021.01307"},{"key":"1039_CR11","doi-asserted-by":"crossref","unstructured":"Chen X, Liu Y, Dong Y, Zhang X, Ma C, Xiong Y, Zhang Y, Guo X (2022a) Mobrecon: mobile-friendly hand mesh reconstruction from monocular image. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 20544\u201320554","DOI":"10.1109\/CVPR52688.2022.01989"},{"key":"1039_CR12","doi-asserted-by":"crossref","unstructured":"Chen Y, Tu Z, Kang D, Bao L, Zhang Y, Zhe X, Chen R, Yuan J (2021c) Model-based 3D hand reconstruction via self-supervised learning. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 10451\u201310460","DOI":"10.1109\/CVPR46437.2021.01031"},{"key":"1039_CR13","doi-asserted-by":"crossref","unstructured":"Chen Z, Hasson Y, Schmid C, Laptev I (2022b) Alignsdf: pose-aligned signed distance fields for hand-object reconstruction. In: European conference on computer vision, Springer, pp 231\u2013248","DOI":"10.1007\/978-3-031-19769-7_14"},{"key":"1039_CR14","doi-asserted-by":"crossref","unstructured":"Chen Z, Chen S, Schmid C, Laptev I (2023) gsdf: geometry-driven signed distance functions for 3D hand-object reconstruction. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 12890\u201312900","DOI":"10.1109\/CVPR52729.2023.01239"},{"key":"1039_CR15","doi-asserted-by":"crossref","unstructured":"Cho J, Youwang K, Oh T-H (2022) Cross-attention of disentangled modalities for 3D human mesh recovery with transformers. In: European conference on computer vision, Springer, pp 342\u2013359","DOI":"10.1007\/978-3-031-19769-7_20"},{"key":"1039_CR16","doi-asserted-by":"crossref","unstructured":"Choi H, Moon G, Lee KM (2020) Pose2mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: Computer vision\u2014ECCV 2020: 16th European conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part VII 16, Springer, pp 769\u2013787","DOI":"10.1007\/978-3-030-58571-6_45"},{"key":"1039_CR17","unstructured":"Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29"},{"key":"1039_CR18","doi-asserted-by":"crossref","unstructured":"Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248\u2013255","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"1039_CR19","doi-asserted-by":"crossref","unstructured":"Dodge S, Karam L (2016) Understanding how image quality affects deep neural networks. In: 2016 eighth international conference on quality of multimedia experience (QoMEX), IEEE, pp 1\u20136","DOI":"10.1109\/QoMEX.2016.7498955"},{"key":"1039_CR20","doi-asserted-by":"crossref","unstructured":"Doosti B, Naha S, Mirbagheri M, Crandall DJ (2020) Hope-net: a graph-based model for hand-object pose estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 6608\u20136617","DOI":"10.1109\/CVPR42600.2020.00664"},{"key":"1039_CR21","doi-asserted-by":"crossref","unstructured":"Fan Z, Spurr A, Kocabas M, Tang S, Black MJ, Hilliges O (2021) Learning to disambiguate strongly interacting hands via probabilistic per-pixel part segmentation. In: 2021 International Conference on 3D Vision (3DV), IEEE, pp 1\u201310","DOI":"10.1109\/3DV53792.2021.00011"},{"key":"1039_CR22","doi-asserted-by":"crossref","unstructured":"Fu Q, Liu X, Xu R, Niebles JC, Kitani KM (2023) Deformer: dynamic fusion transformer for robust hand pose estimation. arXiv preprint arXiv:2303.04991","DOI":"10.1109\/ICCV51070.2023.02157"},{"key":"1039_CR23","doi-asserted-by":"crossref","unstructured":"Ge L, Ren Z, Li Y, Xue Z, Wang Y, Cai J, Yuan J (2019) 3D hand shape and pose estimation from a single RGB image. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 10833\u201310842","DOI":"10.1109\/CVPR.2019.01109"},{"key":"1039_CR24","unstructured":"Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: International conference on machine learning, PMLR, pp 1263\u20131272"},{"key":"1039_CR25","doi-asserted-by":"crossref","unstructured":"Hampali S, Rad M, Oberweger M, Lepetit V (2020) Honnotate: a method for 3D annotation of hand and object poses. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 3196\u20133206","DOI":"10.1109\/CVPR42600.2020.00326"},{"issue":"4","key":"1039_CR26","doi-asserted-by":"publisher","first-page":"87-1","DOI":"10.1145\/3386569.3392452","volume":"39","author":"S Han","year":"2020","unstructured":"Han S, Liu B, Cabezas R, Twigg CD, Zhang P, Petkau J, Yu T-H, Tai C-J, Akbay M, Wang Z et al (2020) Megatrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM Trans Graph (ToG) 39(4):87\u20131","journal-title":"ACM Trans Graph (ToG)"},{"key":"1039_CR27","doi-asserted-by":"crossref","unstructured":"Han S, Wu P-c, Zhang Y, Liu B, Zhang L, Wang Z, Si W, Zhang P, Cai Y, Hodan T, et al. (2022) Umetrack: unified multi-view end-to-end hand tracking for vr. In: SIGGRAPH Asia 2022 conference papers, pp 1\u20139","DOI":"10.1145\/3550469.3555378"},{"key":"1039_CR28","doi-asserted-by":"crossref","unstructured":"Hasson Y, Varol G, Tzionas D, Kalevatykh I, Black MJ, Laptev I, Schmid C (2019a) Learning joint reconstruction of hands and manipulated objects. In: CVPR","DOI":"10.1109\/CVPR.2019.01208"},{"key":"1039_CR29","doi-asserted-by":"crossref","unstructured":"Hasson Y, Varol G, Tzionas D, Kalevatykh I, Black MJ, Laptev I, Schmid C (2019b) Learning joint reconstruction of hands and manipulated objects. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 11807\u201311816","DOI":"10.1109\/CVPR.2019.01208"},{"key":"1039_CR30","doi-asserted-by":"crossref","unstructured":"Hasson Y, Tekin B, Bogo F, Laptev I, Pollefeys M, Schmid C (2020) Leveraging photometric consistency over time for sparsely supervised hand-object reconstruction. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 571\u2013580","DOI":"10.1109\/CVPR42600.2020.00065"},{"key":"1039_CR31","doi-asserted-by":"crossref","unstructured":"Hasson Y, Varol G, Schmid C, Laptev I (2021) Towards unconstrained joint hand-object reconstruction from RGB videos. In: 2021 International conference on 3D vision (3DV), IEEE, pp 659\u2013668","DOI":"10.1109\/3DV53792.2021.00075"},{"key":"1039_CR32","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"1039_CR33","doi-asserted-by":"crossref","unstructured":"Hossain MRI, Little JJ (2018) Exploiting temporal information for 3D human pose estimation. In: Proceedings of the European conference on computer vision (ECCV), pp 68\u201384","DOI":"10.1007\/978-3-030-01249-6_5"},{"key":"1039_CR34","doi-asserted-by":"crossref","unstructured":"Iqbal U, Molchanov P, Gall TBJ, Kautz J (2018) Hand pose estimation via latent 2.5 d heatmap regression. In: Proceedings of the European conference on computer vision (ECCV), pp 118\u2013134","DOI":"10.1007\/978-3-030-01252-6_8"},{"key":"1039_CR35","doi-asserted-by":"crossref","unstructured":"Kanazawa A, Zhang JY, Felsen P, Malik J (2019) Learning 3D human dynamics from video. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5614\u20135623","DOI":"10.1109\/CVPR.2019.00576"},{"key":"1039_CR36","unstructured":"Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114"},{"key":"1039_CR37","unstructured":"Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907"},{"key":"1039_CR38","doi-asserted-by":"crossref","unstructured":"Kocabas M, Athanasiou N, Black MJ (2020) Vibe: Video inference for human body pose and shape estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5253\u20135263","DOI":"10.1109\/CVPR42600.2020.00530"},{"key":"1039_CR39","doi-asserted-by":"crossref","unstructured":"Kulon D, Guler RA, Kokkinos I, Bronstein MM, Zafeiriou S (2020) Weakly-supervised mesh-convolutional hand reconstruction in the wild. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 4990\u20135000","DOI":"10.1109\/CVPR42600.2020.00504"},{"key":"1039_CR40","unstructured":"Lepetit V (2020) Recent advances in 3d object and hand pose estimation. arXiv preprint arXiv:2006.05927"},{"key":"1039_CR41","doi-asserted-by":"crossref","unstructured":"Li K, Yang L, Zhan X, Lv J, Xu W, Li J, Lu C (2021) Artiboost: boosting articulated 3D hand-object pose estimation via online exploration and synthesis. arXiv preprint arXiv:2109.05488","DOI":"10.1109\/CVPR52688.2022.00277"},{"key":"1039_CR42","doi-asserted-by":"crossref","unstructured":"Li M, An L, Zhang H, Wu L, Chen F, Yu T, Liu Y (2022) Interacting attention graph for single image two-hand reconstruction. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 2761\u20132770","DOI":"10.1109\/CVPR52688.2022.00278"},{"key":"1039_CR43","doi-asserted-by":"crossref","unstructured":"Lim GM, Jatesiktat P, Ang WT (2020) Mobilehand: Real-time 3d hand shape and pose estimation from color image. In: Neural information processing: 27th international conference, ICONIP 2020, Bangkok, Thailand, November 18\u201322, 2020, Proceedings, Part IV, Springer, pp 450\u2013459","DOI":"10.1007\/978-3-030-63820-7_52"},{"key":"1039_CR44","doi-asserted-by":"crossref","unstructured":"Lin K, Wang L, Liu Z (2021a) End-to-end human pose and mesh reconstruction with transformers. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1954\u20131963","DOI":"10.1109\/CVPR46437.2021.00199"},{"key":"1039_CR45","doi-asserted-by":"crossref","unstructured":"Lin K, Wang L, Liu Z (2021b) Mesh graphormer. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 12939\u201312948","DOI":"10.1109\/ICCV48922.2021.01270"},{"key":"1039_CR46","doi-asserted-by":"crossref","unstructured":"Lin Z, Ding C, Yao H, Kuang Z, Huang S (2023) Harmonious feature learning for interactive hand-object pose estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 12989\u201312998","DOI":"10.1109\/CVPR52729.2023.01248"},{"key":"1039_CR47","doi-asserted-by":"crossref","unstructured":"Liu S, Jiang H, Xu J, Liu S, Wang X (2021) Semi-supervised 3d hand-object poses estimation with interactions in time. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 14687\u201314697","DOI":"10.1109\/CVPR46437.2021.01445"},{"key":"1039_CR48","unstructured":"Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101"},{"key":"1039_CR49","doi-asserted-by":"crossref","unstructured":"Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5115\u20135124","DOI":"10.1109\/CVPR.2017.576"},{"key":"1039_CR50","doi-asserted-by":"crossref","unstructured":"Moon G, Lee KM (2020) I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single RGB image. In: European conference on computer vision, Springer, pp 752\u2013768","DOI":"10.1007\/978-3-030-58571-6_44"},{"key":"1039_CR51","doi-asserted-by":"crossref","unstructured":"Moon G, Yu S-I, Wen H, Shiratori T, Lee KM (2020) Interhand2. 6m: A dataset and baseline for 3d interacting hand pose estimation from a single TGB image. In: Computer vision\u2013ECCV 2020: 16th European conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XX 16, Springer, pp 548\u2013564","DOI":"10.1007\/978-3-030-58565-5_33"},{"key":"1039_CR52","doi-asserted-by":"crossref","unstructured":"Mueller F, Bernard F, Sotnychenko O, Mehta D, Sridhar S, Casas D, Theobalt C (2018) Ganerated hands for real-time 3D hand tracking from monocular RGB. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49\u201359","DOI":"10.1109\/CVPR.2018.00013"},{"issue":"5","key":"1039_CR53","doi-asserted-by":"publisher","first-page":"1891","DOI":"10.1109\/TVCG.2020.2973057","volume":"26","author":"G Park","year":"2020","unstructured":"Park G, Argyros A, Lee J, Woo W (2020a) 3d hand tracking in the presence of excessive motion blur. IEEE Trans Vis Comput Graph 26(5):1891\u20131901","journal-title":"IEEE Trans Vis Comput Graph"},{"key":"1039_CR54","doi-asserted-by":"crossref","unstructured":"Park G, Kim T-K, Woo W (2020b) 3d hand pose estimation with a single infrared camera via domain transfer learning. In: 2020 IEEE International symposium on mixed and augmented reality (ISMAR), IEEE, pp 588\u2013599","DOI":"10.1109\/ISMAR50242.2020.00086"},{"key":"1039_CR55","doi-asserted-by":"crossref","unstructured":"Qu W, Cui Z, Zhang Y, Meng C, Ma C, Deng X, Wang H (2023) Novel-view synthesis and pose estimation for hand-object interaction from sparse views. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 15100\u201315111","DOI":"10.1109\/ICCV51070.2023.01386"},{"key":"1039_CR56","doi-asserted-by":"crossref","unstructured":"Ren P, Wen C, Zheng X, Xue Z, Sun H, Qi Q, Wang J, Liao J (2023) Decoupled iterative refinement framework for interacting hands reconstruction from a single RGB image. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 8014\u20138025","DOI":"10.1109\/ICCV51070.2023.00736"},{"key":"1039_CR57","doi-asserted-by":"crossref","unstructured":"Romero J, Tzionas D, Black MJ (Nov. 2017a) Embodied hands: modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia). URL http:\/\/doi.acm.org\/10.1145\/3130800.3130883","DOI":"10.1145\/3130800.3130883"},{"issue":"6","key":"1039_CR58","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3130800.3130883","volume":"36","author":"J Romero","year":"2017","unstructured":"Romero J, Tzionas D, Black MJ (2017b) Embodied hands: modeling and capturing hands and bodies together. ACM Trans Graph (TOG) 36(6):1\u201317","journal-title":"ACM Trans Graph (TOG)"},{"key":"1039_CR59","doi-asserted-by":"crossref","unstructured":"Spurr A, Song J, Park S, Hilliges O (2018) Cross-modal deep variational hand pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 89\u201398","DOI":"10.1109\/CVPR.2018.00017"},{"key":"1039_CR60","doi-asserted-by":"crossref","unstructured":"Spurr A, Iqbal U, Molchanov P, Hilliges O, Kautz J (2020) Weakly supervised 3d hand pose estimation via biomechanical constraints. In: Computer vision\u2013ECCV 2020: 16th European conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XVII 16, Springer, pp 211\u2013228","DOI":"10.1007\/978-3-030-58520-4_13"},{"key":"1039_CR61","doi-asserted-by":"crossref","unstructured":"Tang X, Wang T, Fu C-W (2021) Towards accurate alignment in real-time 3D hand-mesh reconstruction. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 11698\u201311707","DOI":"10.1109\/ICCV48922.2021.01149"},{"key":"1039_CR62","doi-asserted-by":"crossref","unstructured":"Tse THE, Kim KI, Leonardis A, Chang HJ (2022) Collaborative learning for hand and object reconstruction with attention-guided graph convolution. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1664\u20131674","DOI":"10.1109\/CVPR52688.2022.00171"},{"key":"1039_CR63","doi-asserted-by":"crossref","unstructured":"Tu Z, Huang Z, Chen Y, Kang D, Bao L, Yang B, Yuan J (2022) Consistent 3d hand reconstruction in video via self-supervised learning. arXiv preprint arXiv:2201.09548","DOI":"10.1109\/TPAMI.2023.3247907"},{"issue":"8","key":"1039_CR64","doi-asserted-by":"publisher","first-page":"9469","DOI":"10.1109\/TPAMI.2023.3247907","volume":"45","author":"Z Tu","year":"2023","unstructured":"Tu Z, Huang Z, Chen Y, Kang D, Bao L, Yang B, Yuan J (2023) Consistent 3D hand reconstruction in video via self-supervised learning. IEEE Tran Patt Anal Mach Intell 45(8):9469\u20139485","journal-title":"IEEE Tran Patt Anal Mach Intell"},{"key":"1039_CR65","first-page":"261","volume":"30","author":"A Vaswani","year":"2017","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:261\u2013272","journal-title":"Adv Neural Inf Process Syst"},{"issue":"6","key":"1039_CR66","first-page":"1","volume":"39","author":"J Wang","year":"2020","unstructured":"Wang J, Mueller F, Bernard F, Sorli S, Sotnychenko O, Qian N, Otaduy MA, Casas D, Theobalt C (2020a) Rgb2hands: real-time tracking of 3d hand interactions from monocular RGB video. ACM Trans Graph (ToG) 39(6):1\u201316","journal-title":"ACM Trans Graph (ToG)"},{"issue":"10","key":"1039_CR67","doi-asserted-by":"publisher","first-page":"3349","DOI":"10.1109\/TPAMI.2020.2983686","volume":"43","author":"J Wang","year":"2020","unstructured":"Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X et al (2020b) Deep high-resolution representation learning for visual recognition. IEEE Trans Patt Anal Mach Intell 43(10):3349\u20133364","journal-title":"IEEE Trans Patt Anal Mach Intell"},{"key":"1039_CR68","doi-asserted-by":"crossref","unstructured":"Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang Y-G (2018) Pixel2mesh: generating 3d mesh models from single RGB images. In: Proceedings of the European conference on computer vision (ECCV), pp 52\u201367","DOI":"10.1007\/978-3-030-01252-6_4"},{"key":"1039_CR69","doi-asserted-by":"crossref","unstructured":"Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724\u20134732","DOI":"10.1109\/CVPR.2016.511"},{"key":"1039_CR70","doi-asserted-by":"crossref","unstructured":"Xu H, Wang T, Tang X, Fu C-W (2023) H2onet: Hand-occlusion-and-orientation-aware network for real-time 3D hand mesh reconstruction. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 17048\u201317058","DOI":"10.1109\/CVPR52729.2023.01635"},{"key":"1039_CR71","unstructured":"Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv preprint arXiv:1810.00826"},{"key":"1039_CR72","doi-asserted-by":"crossref","unstructured":"Yang J, Chang HJ, Lee S, Kwak N (2020) Seqhand: RGB-sequence-based 3d hand pose and shape estimation. In: Computer vision\u2013ECCV 2020: 16th European conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XII 16, Springer, pp 122\u2013139","DOI":"10.1007\/978-3-030-58610-2_8"},{"key":"1039_CR73","doi-asserted-by":"crossref","unstructured":"Yang L, Yao A (2019) Disentangling latent hands for image synthesis and pose estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 9877\u20139886","DOI":"10.1109\/CVPR.2019.01011"},{"key":"1039_CR74","doi-asserted-by":"crossref","unstructured":"Yang L, Li S, Lee D, Yao A (2019) Aligning latent spaces for 3d hand pose estimation. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 2335\u20132343","DOI":"10.1109\/ICCV.2019.00242"},{"key":"1039_CR75","doi-asserted-by":"crossref","unstructured":"Yang L, Chen S, Yao A (2021) Semihand: Semi-supervised hand pose estimation with consistency. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 11364\u201311373","DOI":"10.1109\/ICCV48922.2021.01117"},{"key":"1039_CR76","doi-asserted-by":"crossref","unstructured":"Ye Y, Hebbar P, Gupta A, Tulsiani S (2023) Diffusion-guided reconstruction of everyday hand-object interaction clips. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 19717\u201319728","DOI":"10.1109\/ICCV51070.2023.01806"},{"key":"1039_CR77","doi-asserted-by":"crossref","unstructured":"Yu Z, Li C, Yang L, Zheng X, Mi MB, Lee GH, Yao A (2023) Overcoming the trade-off between accuracy and plausibility in 3D hand shape reconstruction. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 544\u2013553","DOI":"10.1109\/CVPR52729.2023.00060"},{"issue":"4","key":"1039_CR78","first-page":"1","volume":"38","author":"H Zhang","year":"2019","unstructured":"Zhang H, Bo Z-H, Yong J-H, Xu F (2019a) Interactionfusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions. ACM Trans Graph (TOG) 38(4):1\u201311","journal-title":"ACM Trans Graph (TOG)"},{"key":"1039_CR79","doi-asserted-by":"crossref","unstructured":"Zhang X, Li Q, Mo H, Zhang W, Zheng W (2019b) End-to-end hand mesh recovery from a monocular RGB image. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 2354\u20132364","DOI":"10.1109\/ICCV.2019.00244"},{"key":"1039_CR80","doi-asserted-by":"crossref","unstructured":"Zhang X, Huang H, Tan J, Xu H, Yang C, Peng G, Wang L, Liu J (2021) Hand image understanding via deep multi-task learning. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 11281\u201311292","DOI":"10.1109\/ICCV48922.2021.01109"},{"key":"1039_CR81","doi-asserted-by":"crossref","unstructured":"Zhao Z, Zhao X, Wang Y (2021) Travelnet: self-supervised physically plausible hand motion learning from monocular color images. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 11666\u201311676","DOI":"10.1109\/ICCV48922.2021.01146"},{"key":"1039_CR82","doi-asserted-by":"crossref","unstructured":"Zheng X, Ren P, Sun H, Wang J, Qi Q, Liao J (2021) Sar: spatial-aware regression for 3D hand pose and mesh reconstruction from a monocular RGB image. In: 2021 IEEE international symposium on mixed and augmented reality (ISMAR), IEEE, pp 99\u2013108","DOI":"10.1109\/ISMAR52148.2021.00024"},{"key":"1039_CR83","doi-asserted-by":"crossref","unstructured":"Zhou Y, Habermann M, Xu W, Habibie I, Theobalt C, Xu F (2020) Monocular real-time hand shape and motion capture using multi-modal data. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5346\u20135355","DOI":"10.1109\/CVPR42600.2020.00539"},{"key":"1039_CR84","doi-asserted-by":"crossref","unstructured":"Zimmermann C, Brox T (2017) Learning to estimate 3d hand pose from single RGB images. In: Proceedings of the IEEE international conference on computer vision, pp 4903\u20134911","DOI":"10.1109\/ICCV.2017.525"},{"key":"1039_CR85","doi-asserted-by":"crossref","unstructured":"Zimmermann C, Ceylan D, Yang J, Russell B, Argus M, Brox T (2019) Freihand: a dataset for markerless capture of hand pose and shape from single RGB images. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 813\u2013822","DOI":"10.1109\/ICCV.2019.00090"},{"key":"1039_CR86","doi-asserted-by":"crossref","unstructured":"Zuo B, Zhao Z, Sun W, Xie W, Xue Z, Wang Y (2023) Reconstructing interacting hands with interaction prior from monocular images. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 9054\u20139064","DOI":"10.1109\/ICCV51070.2023.00831"}],"container-title":["Virtual Reality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10055-024-01039-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10055-024-01039-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10055-024-01039-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,13]],"date-time":"2024-09-13T04:30:26Z","timestamp":1726201826000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10055-024-01039-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,1]]},"references-count":86,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["1039"],"URL":"https:\/\/doi.org\/10.1007\/s10055-024-01039-3","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-4115590\/v1","asserted-by":"object"}]},"ISSN":["1434-9957"],"issn-type":[{"value":"1434-9957","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,1]]},"assertion":[{"value":"17 March 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 July 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 August 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"143"}}