{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T01:30:19Z","timestamp":1769045419249,"version":"3.49.0"},"reference-count":74,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T00:00:00Z","timestamp":1701734400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2023,12,5]]},"abstract":"<jats:p>This paper addresses the challenging task of reconstructing the poses of multiple individuals engaged in close interactions, captured by multiple calibrated cameras. The difficulty arises from the noisy or false 2D keypoint detections due to inter-person occlusion, the heavy ambiguity in associating keypoints to individuals due to the close interactions, and the scarcity of training data as collecting and annotating motion data in crowded scenes is resource-intensive. We introduce a novel system to address these challenges. Our system integrates a learning-based pose estimation component and its corresponding training and inference strategies. The pose estimation component takes multi-view 2D keypoint heatmaps as input and reconstructs the pose of each individual using a 3D conditional volumetric network. As the network doesn't need images as input, we can leverage known camera parameters from test scenes and a large quantity of existing motion capture data to synthesize massive training data that mimics the real data distribution in test scenes. Extensive experiments demonstrate that our approach significantly surpasses previous approaches in terms of pose accuracy and is generalizable across various camera setups and population sizes. The code is available on our project page: https:\/\/github.com\/zju3dv\/CloseMoCap.<\/jats:p>","DOI":"10.1145\/3618336","type":"journal-article","created":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T10:20:48Z","timestamp":1701771648000},"page":"1-14","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Reconstructing Close Human Interactions from Multiple Views"],"prefix":"10.1145","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2096-7599","authenticated-orcid":false,"given":"Qing","family":"Shuai","sequence":"first","affiliation":[{"name":"State Key Laboratory of CAD&amp;CG, Zhejiang University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7220-7789","authenticated-orcid":false,"given":"Zhiyuan","family":"Yu","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9576-1686","authenticated-orcid":false,"given":"Zhize","family":"Zhou","sequence":"additional","affiliation":[{"name":"Capital University of Physical Education and Sports, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8162-7096","authenticated-orcid":false,"given":"Lixin","family":"Fan","sequence":"additional","affiliation":[{"name":"WeBank, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0491-7058","authenticated-orcid":false,"given":"Haijun","family":"Yang","sequence":"additional","affiliation":[{"name":"WeBank, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4407-3055","authenticated-orcid":false,"given":"Can","family":"Yang","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1926-5597","authenticated-orcid":false,"given":"Xiaowei","family":"Zhou","sequence":"additional","affiliation":[{"name":"State Key Laboratory of CAD&amp;CG, Zhejiang University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,12,5]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3010742"},{"key":"e_1_2_2_2_1","volume-title":"HSPACE: Synthetic parametric humans animated in complex environments. arXiv preprint arXiv:2112.12867","author":"Bazavan Eduard Gabriel","year":"2021","unstructured":"Eduard Gabriel Bazavan, Andrei Zanfir, Mihai Zanfir, William T Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2021. HSPACE: Synthetic parametric humans animated in complex environments. arXiv preprint arXiv:2112.12867 (2021)."},{"key":"e_1_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Vasileios Belagiannis Sikandar Amin Mykhaylo Andriluka Bernt Schiele Nassir Navab and Slobodan Ilic. 2014. 3D pictorial structures for multiple human pose estimation. In CVPR. 1669--1676.","DOI":"10.1109\/CVPR.2014.216"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2509986"},{"key":"e_1_2_2_5_1","volume-title":"Quoc Cuong Pham, and Catherine Achard","author":"Benzine Abdallah","year":"2020","unstructured":"Abdallah Benzine, Florian Chabot, Bertrand Luvison, Quoc Cuong Pham, and Catherine Achard. 2020. Pandanet: Anchor-based single-shot multi-person 3d pose estimation. In CVPR. 6856--6865."},{"key":"e_1_2_2_6_1","volume-title":"BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion. In CVPR. 8726--8737.","author":"Black Michael J.","year":"2023","unstructured":"Michael J. Black, Priyanka Patel, Joachim Tesch, and Jinlong Yang. 2023. BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion. In CVPR. 8726--8737."},{"key":"e_1_2_2_7_1","volume-title":"Lei Yang, and Ziwei Liu.","author":"Cai Zhongang","year":"2022","unstructured":"Zhongang Cai, Daxuan Ren, Ailing Zeng, Zhengyu Lin, Tao Yu, Wenjia Wang, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, and Ziwei Liu. 2022. HuMMan: Multi-modal 4d human dataset for versatile sensing and modeling. In ECCV. Springer, 557--577."},{"key":"e_1_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Zhe Cao Tomas Simon Shih-En Wei and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR. 7291--7299.","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_2_2_9_1","volume-title":"Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement","author":"Cha Junuk","unstructured":"Junuk Cha, Muhammad Saqlain, GeonU Kim, Mingyu Shin, and Seungryul Baek. 2022. Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement. In ECCV. Springer, 660--677."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3026276"},{"key":"e_1_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Ching-Hang Chen Ambrish Tyagi Amit Agrawal Dylan Drover Rohith Mv Stefan Stojanov and James M Rehg. 2019. Unsupervised 3d pose estimation with geometric self-supervision. In CVPR. 5714--5724.","DOI":"10.1109\/CVPR.2019.00586"},{"key":"e_1_2_2_12_1","unstructured":"CMU Graphics Lab. 2000. CMU Graphics Lab Motion Capture Database. http:\/\/mocap.cs.cmu.edu\/."},{"key":"e_1_2_2_13_1","doi-asserted-by":"crossref","unstructured":"Junting Dong Wen Jiang Qixing Huang Hujun Bao and Xiaowei Zhou. 2019. Fast and robust multi-person 3d pose estimation from multiple views. In CVPR. 7792--7801.","DOI":"10.1109\/CVPR.2019.00798"},{"key":"e_1_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Dylan Drover Rohith MV Ching-Hang Chen Amit Agrawal Ambrish Tyagi and Cong Phuoc Huynh. 2018. Can 3d pose be learned from 2d projections alone?. In ECCVW. 78--94.","DOI":"10.1007\/978-3-030-11018-5_7"},{"key":"e_1_2_2_15_1","doi-asserted-by":"crossref","unstructured":"Mihai Fieraru Mihai Zanfir Elisabeta Oneata Alin-Ionut Popa Vlad Olaru and Cristian Sminchisescu. 2020. Three-dimensional reconstruction of human interactions. In CVPR. 7214--7223.","DOI":"10.1109\/CVPR42600.2020.00724"},{"key":"e_1_2_2_16_1","doi-asserted-by":"crossref","unstructured":"Mihai Fieraru Mihai Zanfir Elisabeta Oneata Alin-Ionut Popa Vlad Olaru and Cristian Sminchisescu. 2021a. Learning complex 3d human self-contact. In AAAI. 1343--1351.","DOI":"10.1609\/aaai.v35i2.16223"},{"key":"e_1_2_2_17_1","doi-asserted-by":"crossref","unstructured":"Mihai Fieraru Mihai Zanfir Silviu-Cristian Pirlea Vlad Olaru and Cristian Sminchisescu. 2021b. AIFit: Automatic 3D Human-Interpretable Feedback Models for Fitness Training. In CVPR. 9919--9928.","DOI":"10.1109\/CVPR46437.2021.00979"},{"key":"e_1_2_2_18_1","first-page":"19385","article-title":"Remips: Physically consistent 3d reconstruction of multiple interacting people under weak supervision","volume":"34","author":"Fieraru Mihai","year":"2021","unstructured":"Mihai Fieraru, Mihai Zanfir, Teodor Szente, Eduard Bazavan, Vlad Olaru, and Cristian Sminchisescu. 2021c. Remips: Physically consistent 3d reconstruction of multiple interacting people under weak supervision. NeurIPS 34 (2021), 19385--19397.","journal-title":"NeurIPS"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.5683\/SP2\/JRHDRN"},{"key":"e_1_2_2_20_1","doi-asserted-by":"crossref","unstructured":"Wen Guo Xiaoyu Bie Xavier Alameda-Pineda and Francesc Moreno-Noguer. 2022. Multi-person extreme motion prediction. In CVPR. 13053--13064.","DOI":"10.1109\/CVPR52688.2022.01271"},{"key":"e_1_2_2_21_1","volume-title":"End-to-end dynamic matching network for multi-view multi-person 3d pose estimation","author":"Huang Congzhentao","unstructured":"Congzhentao Huang, Shuai Jiang, Yang Li, Ziyue Zhang, Jason Traish, Chen Deng, Sam Ferguson, and Richard Yi Da Xu. 2020. End-to-end dynamic matching network for multi-view multi-person 3d pose estimation. In ECCV. Springer, 477--493."},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_2_2_23_1","doi-asserted-by":"crossref","unstructured":"Karim Iskakov Egor Burkov Victor Lempitsky and Yury Malkov. 2019. Learnable triangulation of human pose. In ICCV. 7718--7727.","DOI":"10.1109\/ICCV.2019.00781"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","unstructured":"Glenn Jocher. 2020. Ultralytics YOLOv5. 10.5281\/zenodo.3908559","DOI":"10.5281\/zenodo.3908559"},{"key":"e_1_2_2_25_1","unstructured":"Hanbyul Joo Hao Liu Lei Tan Lin Gui Bart Nabbe Iain Matthews Takeo Kanade Shohei Nobuhara and Yaser Sheikh. 2015. Panoptic studio: A massively multiview system for social motion capture. In ICCV. 3334--3342."},{"key":"e_1_2_2_26_1","volume-title":"Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In 3DV","author":"Joo Hanbyul","unstructured":"Hanbyul Joo, Natalia Neverova, and Andrea Vedaldi. 2021. Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In 3DV. IEEE, 42--52."},{"key":"e_1_2_2_27_1","volume-title":"Dyadic human motion prediction. arXiv preprint arXiv:2112.00396","author":"Katircioglu Isinsu","year":"2021","unstructured":"Isinsu Katircioglu, Costa Georgantas, Mathieu Salzmann, and Pascal Fua. 2021. Dyadic human motion prediction. arXiv preprint arXiv:2112.00396 (2021)."},{"key":"e_1_2_2_28_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR."},{"key":"e_1_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Nikos Kolotouros Georgios Pavlakos Michael J Black and Kostas Daniilidis. 2019. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In CVPR. 2252--2261.","DOI":"10.1109\/ICCV.2019.00234"},{"key":"e_1_2_2_30_1","volume-title":"Hdnet: Human depth estimation for multi-person camera-space localization","author":"Lin Jiahao","year":"2020","unstructured":"Jiahao Lin and Gim Hee Lee. 2020. Hdnet: Human depth estimation for multi-person camera-space localization. In ECCV. Springer, 633--648."},{"key":"e_1_2_2_31_1","unstructured":"Jiahao Lin and Gim Hee Lee. 2021. Multi-view multi-person 3d pose estimation with plane sweep stereo. In CVPR. 11886--11895."},{"key":"e_1_2_2_32_1","unstructured":"Tsung-Yi Lin Priya Goyal Ross Girshick Kaiming He and Piotr Doll\u00e1r. 2017. Focal loss for dense object detection. In ICCV. 2980--2988."},{"key":"e_1_2_2_33_1","volume-title":"Microsoft coco: Common objects in context","author":"Lin Tsung-Yi","unstructured":"Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480528"},{"key":"e_1_2_2_35_1","volume-title":"Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation","author":"Liu Qihao","unstructured":"Qihao Liu, Yi Zhang, Song Bai, and Alan Yuille. 2022. Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation. In ECCV. Springer, 497--517."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818013"},{"key":"e_1_2_2_37_1","volume-title":"AMASS: Archive of motion capture as surface shapes. In ICCV. 5442--5451.","author":"Mahmood Naureen","year":"2019","unstructured":"Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons-Moll, and Michael J Black. 2019. AMASS: Archive of motion capture as surface shapes. In ICCV. 5442--5451."},{"key":"e_1_2_2_38_1","volume-title":"Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision. In 3DV","author":"Mehta Dushyant","unstructured":"Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision. In 3DV. IEEE. http:\/\/gvv.mpi-inf.mpg.de\/3dhp_dataset"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392410"},{"key":"e_1_2_2_40_1","doi-asserted-by":"crossref","unstructured":"Dushyant Mehta Oleksandr Sotnychenko Franziska Mueller Weipeng Xu Srinath Sridhar Gerard Pons-Moll and Christian Theobalt. 2018. Single-shot multi-person 3d pose estimation from monocular rgb. In 3DV. 120--130.","DOI":"10.1109\/3DV.2018.00024"},{"key":"e_1_2_2_41_1","volume-title":"KeypointNeRF: Generalizing image-based volumetric avatars using relative spatial encoding of keypoints","author":"Mihajlovic Marko","unstructured":"Marko Mihajlovic, Aayush Bansal, Michael Zollhoefer, Siyu Tang, and Shunsuke Saito. 2022. KeypointNeRF: Generalizing image-based volumetric avatars using relative spatial encoding of keypoints. In ECCV. Springer, 179--197."},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_2_2_43_1","unstructured":"Gyeongsik Moon Juyong Chang and Kyoung Mu Lee. 2019. Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image. In ICCV. 10133--10142."},{"key":"e_1_2_2_44_1","doi-asserted-by":"crossref","unstructured":"Ferda Ofli Rizwan Chaudhry Gregorij Kurillo Ren\u00e9 Vidal and Ruzena Bajcsy. 2013. Berkeley mhad: A comprehensive multimodal human action database. In WACV. 53--60.","DOI":"10.1109\/WACV.2013.6474999"},{"key":"e_1_2_2_45_1","volume-title":"AGORA: Avatars in geography optimized for regression analysis. In CVPR. 13468--13478.","author":"Patel Priyanka","year":"2021","unstructured":"Priyanka Patel, Chun-Hao P Huang, Joachim Tesch, David T Hoffmann, Shashank Tripathi, and Michael J Black. 2021. AGORA: Avatars in geography optimized for regression analysis. In CVPR. 13468--13478."},{"key":"e_1_2_2_46_1","volume-title":"Black","author":"Pavlakos Georgios","year":"2019","unstructured":"Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In CVPR. 10975--10985."},{"key":"e_1_2_2_47_1","doi-asserted-by":"crossref","unstructured":"Sida Peng Yuanqing Zhang Yinghao Xu Qianqian Wang Qing Shuai Hujun Bao and Xiaowei Zhou. 2021. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In CVPR. 9054--9063.","DOI":"10.1109\/CVPR46437.2021.00894"},{"key":"e_1_2_2_48_1","volume-title":"PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers. In CVPR.","author":"Qiu Zhongwei","year":"2023","unstructured":"Zhongwei Qiu, Yang Qiansheng, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Chang Xu, Dongmei Fu, and Jingdong Wang. 2023. PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers. In CVPR."},{"key":"e_1_2_2_49_1","doi-asserted-by":"crossref","unstructured":"Davis Rempe Tolga Birdal Aaron Hertzmann Jimei Yang Srinath Sridhar and Leonidas J Guibas. 2021. Humor: 3d human motion model for robust pose estimation. In ICCV. 11488--11499.","DOI":"10.1109\/ICCV48922.2021.01129"},{"key":"e_1_2_2_50_1","volume-title":"Summary","author":"Robinette Kathleen M","year":"2002","unstructured":"Kathleen M Robinette, Sherri Blackwell, Hein Daanen, Mark Boehmer, Scott Fleming, Tina Brill, David Hoeferlin, and Dennis Burnsides. 2002. Civilian American and European surface anthropometry resource (CAESAR), final report, volume I: Summary. Sytronics Inc Dayton Oh (2002)."},{"key":"e_1_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Qing Shuai Chen Geng Qi Fang Sida Peng Wenhao Shen Xiaowei Zhou and Hujun Bao. 2022. Novel view synthesis of human interactions from sparse multi-view videos. In SIGGRAPH. 1--10.","DOI":"10.1145\/3528233.3530704"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0293-2"},{"key":"e_1_2_2_53_1","volume-title":"VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data","author":"Su Jiajun","unstructured":"Jiajun Su, Chunyu Wang, Xiaoxuan Ma, Wenjun Zeng, and Yizhou Wang. 2022. VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data. In ECCV. Springer, 55--71."},{"key":"e_1_2_2_54_1","doi-asserted-by":"crossref","unstructured":"Yu Sun Wu Liu Qian Bao Yili Fu Tao Mei and Michael J Black. 2022. Putting people in their place: Monocular regression of 3d people in depth. In CVPR. 13243--13252.","DOI":"10.1109\/CVPR52688.2022.01289"},{"key":"e_1_2_2_55_1","doi-asserted-by":"crossref","unstructured":"Matt Trumble Andrew Gilbert Charles Malleson Adrian Hilton and John Collomosse. 2017. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors. In BMVC.","DOI":"10.5244\/C.31.14"},{"key":"e_1_2_2_56_1","volume-title":"Voxelpose: Towards multi-camera 3d human pose estimation in wild environment","author":"Tu Hanyue","year":"2020","unstructured":"Hanyue Tu, Chunyu Wang, and Wenjun Zeng. 2020. Voxelpose: Towards multi-camera 3d human pose estimation in wild environment. In ECCV. Springer, 197--212."},{"key":"e_1_2_2_57_1","doi-asserted-by":"crossref","unstructured":"Gul Varol Javier Romero Xavier Martin Naureen Mahmood Michael J Black Ivan Laptev and Cordelia Schmid. 2017. Learning from synthetic humans. In CVPR. 109--117.","DOI":"10.1109\/CVPR.2017.492"},{"key":"e_1_2_2_58_1","doi-asserted-by":"crossref","unstructured":"Timo Von Marcard Roberto Henschel Michael J Black Bodo Rosenhahn and Gerard Pons-Moll. 2018. Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV. 601--617.","DOI":"10.1007\/978-3-030-01249-6_37"},{"key":"e_1_2_2_59_1","doi-asserted-by":"crossref","unstructured":"Bastian Wandt Marco Rudolph Petrissa Zell Helge Rhodin and Bodo Rosenhahn. 2021. CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild. In CVPR.","DOI":"10.1109\/CVPR46437.2021.01309"},{"key":"e_1_2_2_60_1","volume-title":"Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation","author":"Wang Can","year":"2020","unstructured":"Can Wang, Jiefeng Li, Wentao Liu, Chen Qian, and Cewu Lu. 2020a. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation. In ECCV. Springer, 242--259."},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2983686"},{"key":"e_1_2_2_62_1","first-page":"13153","article-title":"Direct Multi-view Multi-person 3D Human Pose Estimation","volume":"34","author":"Wang Tao","year":"2021","unstructured":"Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, and Jiashi Feng. 2021. Direct Multi-view Multi-person 3D Human Pose Estimation. NeurIPS 34 (2021), 13153--13164.","journal-title":"NeurIPS"},{"key":"e_1_2_2_63_1","doi-asserted-by":"crossref","unstructured":"Zitian Wang Xuecheng Nie Xiaochao Qu Yunpeng Chen and Si Liu. 2022. Distribution-aware single-stage models for multi-person 3D pose estimation. In CVPR. 13096--13105.","DOI":"10.1109\/CVPR52688.2022.01275"},{"key":"e_1_2_2_64_1","volume-title":"Humannerf: Free-viewpoint rendering of moving people from monocular video. In CVPR. 16210--16220.","author":"Weng Chung-Yi","year":"2022","unstructured":"Chung-Yi Weng, Brian Curless, Pratul P Srinivasan, Jonathan T Barron, and Ira Kemelmacher-Shlizerman. 2022. Humannerf: Free-viewpoint rendering of moving people from monocular video. In CVPR. 16210--16220."},{"key":"e_1_2_2_65_1","unstructured":"Size Wu Sheng Jin Wentao Liu Lei Bai Chen Qian Dong Liu and Wanli Ouyang. 2021. Graph-based 3d multi-person pose estimation using multi-view images. In ICCV. 11148--11157."},{"key":"e_1_2_2_66_1","volume-title":"Andrei Zanfir","author":"Xu Hongyi","year":"2020","unstructured":"Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William T Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2020. GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models. In CVPR. 6184--6193."},{"key":"e_1_2_2_67_1","volume-title":"Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection","author":"Ye Hang","unstructured":"Hang Ye, Wentao Zhu, Chunyu Wang, Rujie Wu, and Yizhou Wang. 2022. Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection. In ECCV. Springer, 142--159."},{"key":"e_1_2_2_68_1","unstructured":"Vickie Ye Georgios Pavlakos Jitendra Malik and Angjoo Kanazawa. 2023. Decoupling Human and Camera Motion from Videos in the Wild. In CVPR."},{"key":"e_1_2_2_69_1","doi-asserted-by":"crossref","unstructured":"Yifei Yin Chen Guo Manuel Kaufmann Juan Zarate Jie Song and Otmar Hilliges. 2023. Hi4D: 4D Instance Segmentation of Close Human Interaction. In CVPR. 17016--17027.","DOI":"10.1109\/CVPR52729.2023.01632"},{"key":"e_1_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3138762"},{"key":"e_1_2_2_71_1","volume-title":"GLAMR: Global occlusion-aware human mesh recovery with dynamic cameras. In CVPR. 11038--11049.","author":"Yuan Ye","year":"2022","unstructured":"Ye Yuan, Umar Iqbal, Pavlo Molchanov, Kris Kitani, and Jan Kautz. 2022. GLAMR: Global occlusion-aware human mesh recovery with dynamic cameras. In CVPR. 11038--11049."},{"key":"e_1_2_2_72_1","doi-asserted-by":"crossref","unstructured":"Yuxiang Zhang Liang An Tao Yu Xiu Li Kun Li and Yebin Liu. 2020. 4D association graph for realtime multi-person motion capture using multiple video cameras. In CVPR. 1324--1333.","DOI":"10.1109\/CVPR42600.2020.00140"},{"key":"e_1_2_2_73_1","volume-title":"Smap: Single-shot multi-person absolute 3d pose estimation","author":"Zhen Jianan","year":"2020","unstructured":"Jianan Zhen, Qi Fang, Jiaming Sun, Wentao Liu, Wei Jiang, Hujun Bao, and Xiaowei Zhou. 2020. Smap: Single-shot multi-person absolute 3d pose estimation. In ECCV. Springer, 550--566."},{"key":"e_1_2_2_74_1","doi-asserted-by":"crossref","unstructured":"Zhize Zhou Qing Shuai Yize Wang Qi Fang Xiaopeng Ji Fashuai Li Hujun Bao and Xiaowei Zhou. 2022. QuickPose: Real-time Multi-view Multi-person Pose Estimation in Crowded Scenes. In SIGGRAPH. 1--9.","DOI":"10.1145\/3528233.3530746"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3618336","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3618336","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T10:53:21Z","timestamp":1755773601000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3618336"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,5]]},"references-count":74,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12,5]]}},"alternative-id":["10.1145\/3618336"],"URL":"https:\/\/doi.org\/10.1145\/3618336","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,5]]},"assertion":[{"value":"2023-12-05","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}