{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,15]],"date-time":"2026-02-15T02:57:09Z","timestamp":1771124229964,"version":"3.50.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T00:00:00Z","timestamp":1731974400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2024,12,19]]},"abstract":"<jats:p>\n            We present EgoHDM, an online egocentric-inertial human motion capture (mocap), localization, and dense mapping system. Our system uses 6 inertial measurement units (IMUs) and a commodity head-mounted RGB camera. EgoHDM is the first human mocap system that offers\n            <jats:italic>dense<\/jats:italic>\n            scene mapping in\n            <jats:italic>near real-time.<\/jats:italic>\n            Further, it is fast and robust to initialize and fully closes the loop between physically plausible map-aware global human motion estimation and mocap-aware 3D scene reconstruction. To achieve this, we design a tightly coupled mocap-aware dense bundle adjustment and physics-based body pose correction module leveraging a local body-centric elevation map. The latter introduces a novel terrain-aware contact PD controller, which enables characters to physically contact the given local elevation map thereby reducing human floating or penetration. We demonstrate the performance of our system on established synthetic and real-world benchmarks. The results show that our method reduces human localization, camera pose, and mapping accuracy error by 41%, 71%, 46%, respectively, compared to the state of the art. Our qualitative evaluations on newly captured data further demonstrate that EgoHDM can cover challenging scenarios in non-flat terrain including stepping over stairs and outdoor scenes in the wild. Our project page: https:\/\/handiyin.github.io\/EgoHDM\/\n          <\/jats:p>","DOI":"10.1145\/3687907","type":"journal-article","created":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T15:46:04Z","timestamp":1732031164000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["EgoHDM: A Real-time Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System"],"prefix":"10.1145","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-7287-3879","authenticated-orcid":false,"given":"Handi","family":"Yin","sequence":"first","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6823-0925","authenticated-orcid":false,"given":"Bonan","family":"Liu","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5309-319X","authenticated-orcid":false,"given":"Manuel","family":"Kaufmann","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Zurich, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8325-6073","authenticated-orcid":false,"given":"Jinhao","family":"He","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3511-8565","authenticated-orcid":false,"given":"Sammy","family":"Christen","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Zurich, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-7484-1937","authenticated-orcid":false,"given":"Jie","family":"Song","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"},{"name":"Hong Kong University of Science and Technology, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6026-1083","authenticated-orcid":false,"given":"Pan","family":"Hui","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"},{"name":"Hong Kong University of Science and Technology, Hong Kong, China"}]}],"member":"320","published-online":{"date-parts":[[2024,11,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2021.3075644"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00334"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 682--692","author":"Dai Yudi","year":"2023","unstructured":"Yudi Dai, Yitai Lin, Xiping Lin, Chenglu Wen, Lan Xu, Hongwei Yi, Siqi Shen, Yuexin Ma, and Cheng Wang. 2023. SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 682--692."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00667"},{"key":"e_1_2_1_5_1","unstructured":"Yuming Du Robin Kips Albert Pumarola Sebastian Starke Ali Thabet and Artsiom Sanakoyeu. 2023. Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model. In CVPR."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-016-9574-0"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00430"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3272127.3275108","article-title":"Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time","volume":"37","author":"Huang Yinghao","year":"2018","unstructured":"Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J Black, Otmar Hilliges, and Gerard Pons-Moll. 2018. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1--15.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.373"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20065-6_26"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550469.3555428"},{"key":"e_1_2_1_12_1","volume-title":"International Conference on Computer Vision (ICCV).","author":"Kaufmann Manuel","year":"2023","unstructured":"Manuel Kaufmann, Jie Song, Chen Guo, Kaiyue Shen, Tianjian Jiang, Chengcheng Tang, Juan Jos\u00e9 Z\u00e1rate, and Otmar Hilliges. 2023. EMDB: The Electromagnetic Database of Global 3D Human Pose and Shape in the Wild. In International Conference on Computer Vision (ICCV)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01131"},{"key":"e_1_2_1_14_1","volume-title":"Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera. arXiv preprint arXiv:2401.00847","author":"Lee Jiye","year":"2024","unstructured":"Jiye Lee and Hanbyul Joo. 2024. Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera. arXiv preprint arXiv:2401.00847 (2024)."},{"key":"e_1_2_1_15_1","volume-title":"Luc Van Gool, and Martin R Oswald","author":"Liso Lorenzo","year":"2024","unstructured":"Lorenzo Liso, Erik Sandstr\u00f6m, Vladimir Yugay, Luc Van Gool, and Martin R Oswald. 2024. Loopy-SLAM: Dense Neural SLAM with Loop Closures. arXiv preprint arXiv:2402.09944 (2024)."},{"key":"e_1_2_1_16_1","volume-title":"2023 20th Conference on Robots and Vision (CRV). IEEE, 37--44","author":"Lisus Daniil","year":"2023","unstructured":"Daniil Lisus, Connor Holmes, and Steven Waslander. 2023. Towards open world nerf-based slam. In 2023 20th Conference on Robots and Vision (CRV). IEEE, 37--44."},{"key":"e_1_2_1_17_1","volume-title":"2021 international conference on 3D vision (3DV). IEEE, 930--939","author":"Liu Miao","year":"2021","unstructured":"Miao Liu, Dexin Yang, Yan Zhang, Zhaopeng Cui, James M Rehg, and Siyu Tang. 2021. 4d human body capture from egocentric video via 3d scene grounding. In 2021 international conference on 3D vision (3DV). IEEE, 930--939."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3596711.3596800"},{"key":"e_1_2_1_19_1","first-page":"25019","article-title":"Dynamics-regulated kinematic policy for egocentric pose estimation","volume":"34","author":"Luo Zhengyi","year":"2021","unstructured":"Zhengyi Luo, Ryo Hachiuma, Ye Yuan, and Kris Kitani. 2021. Dynamics-regulated kinematic policy for egocentric pose estimation. Advances in Neural Information Processing Systems 34 (2021), 25019--25032.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_20_1","volume-title":"2022 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2273--2280","author":"Miki Takahiro","year":"2022","unstructured":"Takahiro Miki, Lorenz Wellhausen, Ruben Grandia, Fabian Jenelten, Timon Homberger, and Marco Hutter. 2022. Elevation mapping for locomotion and navigation using gpu. In 2022 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2273--2280."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561230"},{"key":"e_1_2_1_22_1","unstructured":"Noitom. 2024. Retrieved Jan 16 2024 from https:\/\/www.noitom.com\/"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2011.5979561"},{"key":"e_1_2_1_24_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","unstructured":"Monique Paulich Martin Schepers Nina Rudigkeit and G. Bellusci. 2018. Xsens MTw Awinda: Miniature Wireless Inertial-Magnetic Motion Tracker for Highly Accurate 3D Kinematic Applications. 10.13140\/RG.2.2.23576.49929","DOI":"10.13140\/RG.2.2.23576.49929"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01494"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2980179.2980235"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.gmod.2015.04.001"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS55552.2023.10341922"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 3097--3105","author":"Rosinol Antoni","year":"2023","unstructured":"Antoni Rosinol, John J Leonard, and Luca Carlone. 2023b. Probabilistic volumetric fusion for dense monocular slam. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 3097--3105."},{"key":"e_1_2_1_31_1","volume-title":"European Conference on Computer Vision. Springer, 702--720","author":"Shao Ruizhi","year":"2022","unstructured":"Ruizhi Shao, Zerong Zheng, Hongwen Zhang, Jingxiang Sun, and Yebin Liu. 2022. Diffustereo: High quality human reconstruction via diffusion-based stereo using sparse cameras. In European Conference on Computer Vision. Springer, 702--720."},{"key":"e_1_2_1_32_1","volume-title":"WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion. arXiv preprint arXiv:2312.07531","author":"Shin Soyong","year":"2023","unstructured":"Soyong Shin, Juyong Kim, Eni Halilaj, and Michael J Black. 2023. WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion. arXiv preprint arXiv:2312.07531 (2023)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/1632592.1632620"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1966394.1966397"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings, Part II 16","author":"Teed Zachary","year":"2020","unstructured":"Zachary Teed and Jia Deng. 2020. Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16. Springer, 402--419."},{"key":"e_1_2_1_36_1","volume-title":"DOID-SLAM: Deep Visual Slam for Monocular, Stereo, and RGB-D cameras. Advances in neural information processing systems 34","author":"Teed Zachary","year":"2021","unstructured":"Zachary Teed and Jia Deng. 2021. DOID-SLAM: Deep Visual Slam for Monocular, Stereo, and RGB-D cameras. Advances in neural information processing systems 34 (2021), 16558--16569."},{"key":"e_1_2_1_37_1","volume-title":"Selfpose: 3d egocentric pose estimation from a headset mounted camera","author":"Tome Denis","year":"2020","unstructured":"Denis Tome, Thiemo Alldieck, Patrick Peluse, Gerard Pons-Moll, Lourdes Agapito, Hernan Badino, and Fernando De la Torre. 2020. Selfpose: 3d egocentric pose estimation from a headset mounted camera. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.31.14"},{"key":"e_1_2_1_39_1","volume-title":"Computer graphics forum","author":"Marcard Timo Von","unstructured":"Timo Von Marcard, Bodo Rosenhahn, Michael J Black, and Gerard Pons-Moll. 2017. Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Computer graphics forum, Vol. 36. Wiley Online Library, 349--360."},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 22024--22034","author":"Wang Chen","year":"2023","unstructured":"Chen Wang, Dasong Gao, Kuan Xu, Junyi Geng, Yaoyu Hu, Yuheng Qiu, Bowen Li, Fan Yang, Brady Moon, Abhinav Pandey, et al. 2023a. Pypose: A library for robot learning with physics-based optimization. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 22024--22034."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01130"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01252"},{"key":"e_1_2_1_43_1","volume-title":"Mo 2 cap 2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera","author":"Xu Weipeng","year":"2019","unstructured":"Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Pascal Fua, HansPeter Seidel, and Christian Theobalt. 2019. Mo 2 cap 2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera. IEEE transactions on visualization and computer graphics 25, 5 (2019), 2093--2101."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.15057"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISMAR55827.2022.00066"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 21222--21232","author":"Ye Vickie","year":"2023","unstructured":"Vickie Ye, Georgios Pavlakos, Jitendra Malik, and Angjoo Kanazawa. 2023. Decoupling human and camera motion from videos in the wild. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 21222--21232."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592099"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01282"},{"key":"e_1_2_1_49_1","first-page":"1","article-title":"Transpose: Real-time 3d human translation and pose estimation with six inertial sensors","volume":"40","author":"Yi Xinyu","year":"2021","unstructured":"Xinyu Yi, Yuxiao Zhou, and Feng Xu. 2021. Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1--13.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.01018"},{"key":"e_1_2_1_51_1","volume-title":"2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6232--6238","author":"Zhang Wei","year":"2023","unstructured":"Wei Zhang, Sen Wang, Xingliang Dong, Rongwei Guo, and Norbert Haala. 2023b. Bamf-slam: Bundle adjusted multi-fisheye visual-inertial slam using recurrent field transforms. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6232--6238."},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 3727--3737","author":"Zhang Youmin","year":"2023","unstructured":"Youmin Zhang, Fabio Tosi, Stefano Mattoccia, and Matteo Poggi. 2023a. Go-slam: Global optimization for consistent 3d instant reconstruction. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 3727--3737."},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 1772--1781","author":"Zhang Yahui","year":"2021","unstructured":"Yahui Zhang, Shaodi You, and Theo Gevers. 2021. Automatic calibration of the fisheye camera for egocentric 3d human pose estimation from a single image. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 1772--1781."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01245"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687907","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3687907","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:09:57Z","timestamp":1750295397000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687907"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,19]]},"references-count":54,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,19]]}},"alternative-id":["10.1145\/3687907"],"URL":"https:\/\/doi.org\/10.1145\/3687907","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,19]]},"assertion":[{"value":"2024-11-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}