{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T21:08:37Z","timestamp":1776114517305,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,8,12]],"date-time":"2020-08-12T00:00:00Z","timestamp":1597190400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2020,8,31]]},"abstract":"<jats:p>We present a system for real-time hand-tracking to drive virtual and augmented reality (VR\/AR) experiences. Using four fisheye monochrome cameras, our system generates accurate and low-jitter 3D hand motion across a large working volume for a diverse set of users. We achieve this by proposing neural network architectures for detecting hands and estimating hand keypoint locations. Our hand detection network robustly handles a variety of real world environments. The keypoint estimation network leverages tracking history to produce spatially and temporally consistent poses. We design scalable, semi-automated mechanisms to collect a large and diverse set of ground truth data using a combination of manual annotation and automated tracking. Additionally, we introduce a detection-by-tracking method that increases smoothness while reducing the computational cost; the optimized system runs at 60Hz on PC and 30Hz on a mobile processor. Together, these contributions yield a practical system for capturing a user's hands and is the default feature on the Oculus Quest VR headset powering input and social presence.<\/jats:p>","DOI":"10.1145\/3386569.3392452","type":"journal-article","created":{"date-parts":[[2020,8,12]],"date-time":"2020-08-12T11:44:27Z","timestamp":1597232667000},"update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":188,"title":["MEgATrack"],"prefix":"10.1145","volume":"39","author":[{"given":"Shangchen","family":"Han","sequence":"first","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Beibei","family":"Liu","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Randi","family":"Cabezas","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Christopher D.","family":"Twigg","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Peizhao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Jeff","family":"Petkau","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Tsz-Ho","family":"Yu","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Chun-Jung","family":"Tai","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Muzaffer","family":"Akbay","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Zheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Asaf","family":"Nitzan","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Gang","family":"Dong","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Yuting","family":"Ye","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Lingling","family":"Tao","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Chengde","family":"Wan","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]},{"given":"Robert","family":"Wang","sequence":"additional","affiliation":[{"name":"Facebook Reality Labs"}]}],"member":"320","published-online":{"date-parts":[[2020,8,12]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"Vassilis Athitsos and Stan Sclaroff. 2003. Estimating 3D hand pose from a cluttered image. Technical Report. Boston University Computer Science Department."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33783-3_46"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01110"},{"key":"e_1_2_2_4_1","volume-title":"Weakly-supervised 3d hand pose estimation from monocular rgb images. ECCV","author":"Cai Yujun","year":"2018","unstructured":"Yujun Cai, Liuhao Ge, Jianfei Cai, and Junsong Yuan. 2018. Weakly-supervised 3d hand pose estimation from monocular rgb images. ECCV, Springer 12 (2018)."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00236"},{"key":"e_1_2_2_6_1","volume-title":"ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Dai Xiaoliang","unstructured":"Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, and Niraj K. Jha. 2019. ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587752"},{"key":"e_1_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Endri Dibra Thomas Wolf Cengiz Oztireli and Markus Gross. 2017. How to Refine 3D Hand Pose Estimation from Unlabelled Depth Data?. In 3DV.","DOI":"10.1109\/3DV.2017.00025"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01109"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201399"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01208"},{"key":"e_1_2_2_12_1","volume-title":"Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980--2988","author":"He K.","unstructured":"K. He, G. Gkioxari, P. Doll\u00e1r, and R. Girshick. 2017. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980--2988."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275108"},{"key":"e_1_2_2_15_1","volume-title":"Proceedings of European Conference on Computer Vision.","author":"Iqbal Umar","year":"2018","unstructured":"Umar Iqbal, Pavlo Molchanov, Thomas Breuel, Juergen Gall, and Jan Kautz. 2018. Hand Pose Estimation via Latent 2.5 D Heatmap Regression. In Proceedings of European Conference on Computer Vision."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.153"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00013"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322958"},{"key":"e_1_2_2_20_1","volume-title":"Proceedings of International Conference on Computer Vision (ICCV). 10","author":"Mueller Franziska","year":"2017","unstructured":"Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor. In Proceedings of International Conference on Computer Vision (ICCV). 10. https:\/\/handtracker.mpi-inf.mpg.de\/projects\/OccludedHands\/"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.536"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6247885"},{"key":"e_1_2_2_23_1","volume-title":"Face and Gesture","author":"Prisacariu Victor Adrian","unstructured":"Victor Adrian Prisacariu and Ian Reid. 2011. Robust 3D hand tracking for human computer interaction. In Face and Gesture. IEEE, 368--375."},{"key":"e_1_2_2_24_1","volume-title":"Stronger. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Redmon Joseph","year":"2017","unstructured":"Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_2_25_1","volume-title":"Black","author":"Romero Javier","year":"2017","unstructured":"Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (Nov. 2017)."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Toby Sharp Cem Keskin Duncan Robertson Jonathan Taylor Jamie Shotton David Kim Christoph Rhemann Ido Leichter Alon Vinnikov Yichen Wei et al. 2015. Accurate robust and flexible real-time hand tracking. In CHI.","DOI":"10.1145\/2702123.2702179"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.241"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.494"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00017"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_19"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.189"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.217"},{"key":"e_1_2_2_34_1","volume-title":"Robust Articulated-ICP for Real-Time Hand Tracking. Computer Graphics Forum (Symposium on Geometry Processing) 34, 5","author":"Tagliasacchi Andrea","year":"2015","unstructured":"Andrea Tagliasacchi, Matthias Schroeder, Anastasia Tkach, Sofien Bouaziz, Mario Botsch, and Mark Pauly. 2015. Robust Articulated-ICP for Real-Time Hand Tracking. Computer Graphics Forum (Symposium on Geometry Processing) 34, 5 (2015)."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925965"},{"key":"e_1_2_2_36_1","volume-title":"Articulated distance fields for ultra-fast tracking of hands interacting. ACM Transactions on Graphics (TOG)","author":"Taylor Jonathan","year":"2017","unstructured":"Jonathan Taylor, Vladimir Tankovich, Danhang Tang, Cem Keskin, David Kim, Philip Davidson, Adarsh Kowdle, and Shahram Izadi. 2017. Articulated distance fields for ultra-fast tracking of hands interacting. ACM Transactions on Graphics (TOG) (2017)."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00464"},{"key":"e_1_2_2_38_1","volume-title":"Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Transactions on Graphics (TOG)","author":"Tompson Jonathan","year":"2014","unstructured":"Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Transactions on Graphics (TOG) (2014)."},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01111"},{"key":"e_1_2_2_40_1","volume-title":"Real-time hand-tracking with a color glove. ACM transactions on graphics (TOG) 28, 3","author":"Wang Robert Y","year":"2009","unstructured":"Robert Y Wang and Jovan Popovi\u0107. 2009. Real-time hand-tracking with a color glove. ACM transactions on graphics (TOG) 28, 3 (2009), 63."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01011"},{"key":"e_1_2_2_42_1","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).","author":"Yuan Shanxin","year":"2017","unstructured":"Shanxin Yuan, Qi Ye, Bjorn Stenger, Siddhand Jain, and Tae-Kyun Kim. 2017. BigHand2. 2M Benchmark: Hand Pose Dataset and State of the Art Analysis. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR)."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3322998"},{"key":"e_1_2_2_44_1","volume-title":"3d hand pose tracking and estimation using stereo matching. arXiv preprint arXiv:1610.07214","author":"Zhang Jiawei","year":"2016","unstructured":"Jiawei Zhang, Jianbo Jiao, Mingliang Chen, Liangqiong Qu, Xiaobin Xu, and Qingxiong Yang. 2016. 3d hand pose tracking and estimation using stereo matching. arXiv preprint arXiv:1610.07214 (2016)."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00244"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.525"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.525"},{"key":"e_1_2_2_48_1","doi-asserted-by":"crossref","unstructured":"Christian Zimmermann Duygu Ceylan Jimei Yang Bryan Russell Max Argus and Thomas Brox. 2019. FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images. arXiv:1909.04349 [cs.CV]","DOI":"10.1109\/ICCV.2019.00090"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3386569.3392452","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3386569.3392452","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,25]],"date-time":"2025-06-25T05:40:25Z","timestamp":1750830025000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3386569.3392452"}},"subtitle":["monochrome egocentric articulated hand-tracking for virtual reality"],"short-title":[],"issued":{"date-parts":[[2020,8,12]]},"references-count":48,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,8,31]]}},"alternative-id":["10.1145\/3386569.3392452"],"URL":"https:\/\/doi.org\/10.1145\/3386569.3392452","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,12]]},"assertion":[{"value":"2020-08-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}