{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T15:41:52Z","timestamp":1759160512838,"version":"3.44.0"},"reference-count":192,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>Markerless human body motion capture promises to remove markers from capture studios, thus simplifying its diverse application fields, from life science to virtual reality. This comprehensive review examines recent advances in real-time markerless motion capture systems from 2020 to 2024, focusing on real-time multi-view, multi-person tracking solutions. Recent advancements, particularly driven by neural network-based pose estimation, have enabled real-time tracking with minimal latency, achieving at least 25 frames per second. Through systematic analysis, we evaluate these methods based on three key metrics: accuracy in pose reconstruction, end-to-end latency, and computational efficiency. Special attention is given to how architectural decisions impact system scalability regarding the number of camera viewpoints and tracked individuals. While current methods show promise for applications like sports analysis and virtual reality, challenges remain in achieving optimal performance across all metrics. Through systematic analysis of leading real-time pipelines, we identify key technical advances and persistent challenges. This synthesis provides critical insights for researchers and practitioners working to develop more robust markerless motion capture systems, while outlining important directions for future research.<\/jats:p>","DOI":"10.1145\/3757733","type":"journal-article","created":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T11:35:17Z","timestamp":1756294517000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Comprehensive Review of Real-Time Multi-View Multi-Person Markerless Motion Capture"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9649-4428","authenticated-orcid":false,"given":"Pierre","family":"Nagorny","sequence":"first","affiliation":[{"name":"Artanim Foundation","place":["Geneva, Switzerland"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7814-5128","authenticated-orcid":false,"given":"Bart","family":"Kevelham","sequence":"additional","affiliation":[{"name":"Artanim Foundation","place":["Geneva, Switzerland"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2676-7372","authenticated-orcid":false,"given":"Sylvain","family":"Chagu\u00e9","sequence":"additional","affiliation":[{"name":"Artanim Foundation","place":["Geneva, Switzerland"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7018-885X","authenticated-orcid":false,"given":"Caecilia","family":"Charbonnier","sequence":"additional","affiliation":[{"name":"Artanim Foundation","place":["Geneva, Switzerland"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,29]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","unstructured":"Jake K. Aggarwal and Quin Cai. 1997. Human motion analysis: A review. In Proceedings IEEE Nonrigid and Articulated Motion Workshop. IEEE Comput. Soc San Juan Puerto Rico 90\u2013102. DOI:10.1109\/NAMW.1997.609859","DOI":"10.1109\/NAMW.1997.609859"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/MNRAO.1994.346261"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.1997.0620"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445888"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.3389\/fspor.2021.809898"},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1109\/ICPR.1996.547022","volume-title":"Proceedings of the 13th International Conference on Pattern Recognition","author":"Azarbayejani Ali","year":"1996","unstructured":"Ali Azarbayejani and Alex Pentland. 1996. Real-time self-calibrating stereo person tracking using 3-D shape estimation from blob features. In Proceedings of the 13th International Conference on Pattern Recognition. 627\u2013632. DOI:10.1109\/ICPR.1996.547022"},{"key":"e_1_3_2_8_2","first-page":"1","volume-title":"Proceedings of the SIGGRAPH Asia 2013 Symposium on Mobile Graphics and Interactive Applications.","author":"Bai Huidong","year":"2013","unstructured":"Huidong Bai, Lei Gao, Jihad El-Sana, and Mark Billinghurst. 2013. Markerless 3D gesture-based interaction for handheld augmented reality interfaces. In Proceedings of the SIGGRAPH Asia 2013 Symposium on Mobile Graphics and Interactive Applications.Association for Computing Machinery, New York, NY, USA, 1. DOI:10.1145\/2543651.2543678"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","unstructured":"Nicolas Ballas Li Yao Chris Pal and Aaron Courville. 2016. Delving deeper into convolutional networks for learning video representations. arXiv:1511.06432 [cs]. DOI:10.48550\/arXiv.1511.06432","DOI":"10.48550\/arXiv.1511.06432"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","unstructured":"Eduard Gabriel Bazavan Andrei Zanfir Mihai Zanfir William T. Freeman Rahul Sukthankar and Cristian Sminchisescu. 2022. HSPACE: Synthetic parametric humans animated in complex environments. arXiv:2112.12867 [cs]. DOI:10.48550\/arXiv.2112.12867","DOI":"10.48550\/arXiv.2112.12867"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.216"},{"key":"e_1_3_2_12_2","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1007\/978-3-319-16178-5_52","volume-title":"Proceedings of the Computer Vision - ECCV 2014 Workshops.","author":"Belagiannis Vasileios","year":"2015","unstructured":"Vasileios Belagiannis, Xinchao Wang, Bernt Schiele, Pascal Fua, Slobodan Ilic, and Nassir Navab. 2015. Multiple human pose estimation with temporally consistent 3D pictorial structures. In Proceedings of the Computer Vision - ECCV 2014 Workshops.Lourdes Agapito, Michael M. Bronstein, and Carsten Rother (Eds.), Lecture Notes in Computer Science, Springer International Publishing, Cham, 742\u2013754. DOI:10.1007\/978-3-319-16178-5_52"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","unstructured":"Daniel Bermuth Alexander Poeppel and Wolfgang Reif. 2024. VoxelKeypointFusion: Generalizable multi-view multi-person pose estimation. arXiv:2410.18723 [cs.CV]. DOI:10.48550\/arXiv.2410.18723","DOI":"10.48550\/arXiv.2410.18723"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2016.7533003"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","unstructured":"Federica Bogo Angjoo Kanazawa Christoph Lassner Peter Gehler Javier Romero and Michael J. Black. 2016. Keep It SMPL: Automatic estimation of 3D human pose and shape from a single image. arXiv:1607.08128 [cs]. DOI:10.48550\/arXiv.1607.08128","DOI":"10.48550\/arXiv.1607.08128"},{"key":"e_1_3_2_16_2","first-page":"77","volume-title":"Proceedings of the 1st Conference on Fairness, Accountability and Transparency","author":"Buolamwini Joy","year":"2018","unstructured":"Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency. PMLR, 77\u201391."},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","first-page":"3618","DOI":"10.1109\/CVPR.2013.464","volume-title":"Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition","author":"Burenius Magnus","year":"2013","unstructured":"Magnus Burenius, Josephine Sullivan, and Stefan Carlsson. 2013. 3D pictorial structures for multiple view articulated pose estimation. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. 3618\u20133625. DOI:10.1109\/CVPR.2013.464"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i2.27847"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i2.27849"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2007.05.005"},{"key":"e_1_3_2_21_2","unstructured":"Zhe Cao Gines Hidalgo Tomas Simon Shih-En Wei and Yaser Sheikh. 2019. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. arXiv:1812.08008."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/0262-8856(95)93154-K"},{"key":"e_1_3_2_24_2","doi-asserted-by":"crossref","unstructured":"Haoming Chen Runyang Feng Sifan Wu Hao Xu Fengcheng Zhou and Zhenguang Liu. 2022. 2D human pose estimation: A survey. arXiv:2204.07370.","DOI":"10.1007\/s00530-022-01019-0"},{"key":"e_1_3_2_25_2","doi-asserted-by":"crossref","unstructured":"Long Chen Haizhou Ai Rui Chen Zijie Zhuang and Shuang Liu. 2021. Cross-view tracking for multi-human 3D pose estimation at over 100 FPS. arXiv:1409.0473.","DOI":"10.1109\/CVPR42600.2020.00334"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","unstructured":"Xianjie Chen and Alan Yuille. 2014. Articulated pose estimation by a graphical model with image dependent pairwise relations. arXiv:1407.3399 [cs]. DOI:10.48550\/arXiv.1407.3399","DOI":"10.48550\/arXiv.1407.3399"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","unstructured":"Yuxing Chen Renshu Gu Ouhan Huang and Gangyong Jia. 2022. VTP: Volumetric transformer for multi-view multi-person 3D pose estimation. arXiv:2205.12602. Retrieved from https:\/\/arxiv.org\/abs\/2205.12602. DOI:10.48550\/arXiv.2205.12602","DOI":"10.48550\/arXiv.2205.12602"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2019.102897"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","unstructured":"Yilun Chen Zhicheng Wang Yuxiang Peng Zhiqiang Zhang Gang Yu and Jian Sun. 2018. Cascaded pyramid network for multi-person pose estimation. arXiv:1711.07319 [cs]. DOI:10.48550\/arXiv.1711.07319","DOI":"10.48550\/arXiv.1711.07319"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/21.148408"},{"key":"e_1_3_2_31_2","doi-asserted-by":"crossref","unstructured":"Bowen Cheng Bin Xiao Jingdong Wang Honghui Shi Thomas S. Huang and Lei Zhang. 2020. HigherHRNet: Scale-aware representation learning for bottom-up human pose estimation. arXiv:1908.10357. Retrieved from https:\/\/arxiv.org\/abs\/1908.10357","DOI":"10.1109\/CVPR42600.2020.00543"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","unstructured":"Rohan Choudhury Kris Kitani and Laszlo A. Jeni. 2023. TEMPO: Efficient multi-view pose estimation tracking and forecasting. arXiv:2309.07910 [cs]. DOI:10.48550\/arXiv.2309.07910","DOI":"10.48550\/arXiv.2309.07910"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40798-018-0139-y"},{"key":"e_1_3_2_34_2","unstructured":"XRMoCap Contributors. 2022. OpenXRLab Multi-view Motion Capture Toolbox and Benchmark. Retrieved from https:\/\/github.com\/openxrlab\/xrmocap"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CRV.2005.65"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2004.1334531"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","unstructured":"Junli Deng Haoyuan Yao and Ping Shi. 2023. Enhanced 3D pose estimation in multi-person multi-view scenarios through unsupervised domain adaptation with dropout discriminator. Sensors 23 20 Article 8406 (2023) 17 pages. DOI:10.3390\/s23208406","DOI":"10.3390\/s23208406"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","unstructured":"Yann Desmarais Denis Mottet Pierre Slangen and Philippe Montesinos. 2021. A review of 3D human pose estimation algorithms for markerless motion capture. arXiv:2010.06449 [cs]. DOI:10.48550\/arXiv.2010.06449","DOI":"10.48550\/arXiv.2010.06449"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","unstructured":"Mathis D\u2019Haene Fr\u00e9d\u00e9ric Chorin Serge S. Colson Olivier Gu\u00e9rin Rapha\u00ebl Zory and Elodie Piche. 2024. Validation of a 3D markerless motion capture tool using multiple pose and depth estimations for quantitative gait analysis. Sensors 24 22 Article 7105 (2024) 11 pages. DOI:10.3390\/s24227105","DOI":"10.3390\/s24227105"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","unstructured":"Junting Dong Wen Jiang Qixing Huang Hujun Bao and Xiaowei Zhou. 2019. Fast and robust multi-person 3D pose estimation from multiple views. arXiv:1901.04111 [cs]. DOI:10.48550\/arXiv.1901.04111","DOI":"10.48550\/arXiv.1901.04111"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00530-022-00980-0"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299005"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","unstructured":"Alessio Elmi Davide Mazzini and Pietro Tortella. 2020. Light3DPose: Real-time multi-person 3D poseestimation from multiple views. arXiv:2004.02688 [cs]. DOI:10.48550\/arXiv.2004.02688","DOI":"10.48550\/arXiv.2004.02688"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","unstructured":"Matteo Fabbri Fabio Lanzi Simone Calderara Andrea Palazzi Roberto Vezzani and Rita Cucchiara. 2018. Learning to detect and track visible and occluded body joints in a virtual world. arXiv:1803.08319 [cs]. DOI:10.48550\/arXiv.1803.08319","DOI":"10.48550\/arXiv.1803.08319"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2022.3222784"},{"key":"e_1_3_2_46_2","first-page":"2353","volume-title":"Proceedings of the 2017 IEEE International Conference on Computer Vision","author":"Fang Hao-Shu","year":"2017","unstructured":"Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional multi-person pose estimation. In Proceedings of the 2017 IEEE International Conference on Computer Vision. 2353\u20132362. DOI:10.1109\/ICCV.2017.256"},{"key":"e_1_3_2_47_2","volume-title":"Proceedings of the Eurographics\/ ACM SIGGRAPH Symposium on Computer Animation - Posters.","author":"Feiz Hossein","year":"2024","unstructured":"Hossein Feiz, David Labb\u00e9, and Sheldon Andrews. 2024. Markerless multi-view multi-person tracking for combat sports. In Proceedings of the Eurographics\/ ACM SIGGRAPH Symposium on Computer Animation - Posters.Victor Zordan (Ed.), The Eurographics Association. DOI:10.2312\/sca.20241162"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00724"},{"key":"e_1_3_2_49_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition.","author":"Fieraru Mihai","year":"2020","unstructured":"Mihai Fieraru, Mihai Zanfir, Elisabeta Oneata, Alin-Ionut Popa, Vlad Olaru, and Cristian Sminchisescu. 2020. Three-dimensional reconstruction of human interactions. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","unstructured":"Dariu Mihai Gavrila and Larry Steven Davis. 1996. 3-D model-based tracking of humans in action: a multi-view approach. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 73\u201380. DOI:10.1109\/CVPR.1996.517056","DOI":"10.1109\/CVPR.1996.517056"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","unstructured":"Timnit Gebru Jamie Morgenstern Briana Vecchione Jennifer Wortman Vaughan Hanna Wallach Hal Daum\u00e9 III and Kate Crawford. 2021. Datasheets for datasets. arXiv:1803.09010 [cs]. DOI:10.48550\/arXiv.1803.09010","DOI":"10.48550\/arXiv.1803.09010"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr52729.2023.01253"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.3390\/s16121966"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","unstructured":"Hengkai Guo Tang Tang Guozhong Luo Riwei Chen Yongchen Lu and Linfu Wen. 2019. Multi-domain pose network for multi-person pose estimation and tracking. 11130 (2019) 209\u2013216. arxiv:1810.08338 [cs] DOI:10.1007\/978-3-030-11012-3_17","DOI":"10.1007\/978-3-030-11012-3_17"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","unstructured":"Wen Guo Xiaoyu Bie Xavier Alameda-Pineda and Francesc Moreno-Noguer. 2022. Multi-person extreme motion prediction. arXiv:2105.08825 [cs]. DOI:10.48550\/arXiv.2105.08825","DOI":"10.48550\/arXiv.2105.08825"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","unstructured":"Wen Guo Xiaoyu Bie Francesc Moreno-Noguer and Xavier Alameda-Pineda. 2021. ExPI Dataset. DOI:10.5281\/zenodo.7567798","DOI":"10.5281\/zenodo.7567798"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.3390\/app12125862"},{"key":"e_1_3_2_58_2","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1145\/3351095.3372826","volume-title":"Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency","author":"Hanna Alex","year":"2020","unstructured":"Alex Hanna, Emily Denton, Andrew Smart, and Jamila Smith-Loud. 2020. Towards a critical race methodology in algorithmic fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 501\u2013512. arxiv:1912.03593 [cs] DOI:10.1145\/3351095.3372826"},{"key":"e_1_3_2_59_2","first-page":"2980","volume-title":"Proceedings of the 2017 IEEE International Conference on Computer Vision.","author":"He Kaiming","year":"2017","unstructured":"Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision.2980\u20132988. DOI:10.1109\/ICCV.2017.322"},{"key":"e_1_3_2_60_2","first-page":"770","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"He Kaiming","year":"2016","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770\u2013778. arxiv:1512.03385 Retrieved from https:\/\/arxiv.org\/abs\/1512.03385"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","unstructured":"Yihui He Rui Yan Katerina Fragkiadaki and Shoou-I. Yu. 2020. Epipolar transformers. arXiv:2005.04551 [cs]. DOI:10.48550\/arXiv.2005.04551","DOI":"10.48550\/arXiv.2005.04551"},{"key":"e_1_3_2_62_2","unstructured":"Gines Hidalgo Yaadhav Raaj Haroon Idrees Donglai Xiang Hanbyul Joo Tomas Simon and Yaser Sheikh. 2019. Single-network whole-body pose estimation. arXiv:1909.13423. Retrieved from https:\/\/arxiv.org\/abs\/1909.13423"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2012.2196975"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","unstructured":"Mir Rayat Imtiaz Hossain and James J. Little. 2018. Exploiting temporal information for 3D pose estimation. 69\u201386. arXiv:1711.08585 [cs]. DOI:10.1007\/978-3-030-01249-6_5","DOI":"10.1007\/978-3-030-01249-6_5"},{"key":"e_1_3_2_65_2","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. (2017). arXiv:1704.04861. Retrieved from https:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_2_66_2","unstructured":"Yinghao Huang Manuel Kaufmann Emre Aksan Michael J. Black Otmar Hilliges and Gerard Pons-Moll. 2018. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. (2018). arxiv:1810.04703. Retrieved from https:\/\/arxiv.org\/abs\/1810.04703"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2001.937594"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","unstructured":"Karim Iskakov Egor Burkov Victor Lempitsky and Yury Malkov. 2019. Learnable triangulation of human pose. arXiv:1905.05754 [cs]. DOI:10.48550\/arXiv.1905.05754","DOI":"10.48550\/arXiv.1905.05754"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2009.2027608"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","unstructured":"Tao Jiang Peng Lu Li Zhang Ningsheng Ma Rui Han Chengqi Lyu Yining Li and Kai Chen. 2023. RTMPose: Real-time multi-person pose estimation based on MMPose. arXiv:2303.07399 [cs.CV]. DOI:10.48550\/arXiv.2303.07399","DOI":"10.48550\/arXiv.2303.07399"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.5244\/C.24.12"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","unstructured":"Hanbyul Joo Tomas Simon Xulong Li Hao Liu Lei Tan Lin Gui Sean Banerjee Timothy Godisart Bart Nabbe Iain Matthews Takeo Kanade Shohei Nobuhara and Yaser Sheikh. 2016. Panoptic studio: A massively multiview system for social interaction capture. arXiv:1612.03153 [cs]. DOI:10.48550\/arXiv.1612.03153","DOI":"10.48550\/arXiv.1612.03153"},{"key":"e_1_3_2_74_2","doi-asserted-by":"crossref","unstructured":"Abdolrahim Kadkhodamohammadi and Nicolas Padoy. 2019. A generalizable approach for multi-view 3D human pose regression. arXiv:1804.10462. Retrieved from https:\/\/arxiv.org\/abs\/1804.10462","DOI":"10.1007\/s00138-020-01120-2"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","unstructured":"Muhammed Kocabas Nikos Athanasiou and Michael J. Black. 2020. VIBE: Video inference for human body pose and shape estimation. arXiv:1912.05656 [cs]. DOI:10.48550\/arXiv.1912.05656","DOI":"10.48550\/arXiv.1912.05656"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","unstructured":"Muhammed Kocabas Salih Karagoz and Emre Akbas. 2019. Self-supervised learning of 3D human pose Using Multi-view Geometry. arXiv:1903.02330 [cs]. DOI:10.48550\/arXiv.1903.02330","DOI":"10.48550\/arXiv.1903.02330"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","unstructured":"Sven Kreiss Lorenzo Bertoni and Alexandre Alahi. 2019. PifPaf: Composite fields for human pose estimation. arXiv:1903.06593 [cs]. DOI:10.48550\/arXiv.1903.06593","DOI":"10.48550\/arXiv.1903.06593"},{"key":"e_1_3_2_78_2","unstructured":"Sven Kreiss Lorenzo Bertoni and Alexandre Alahi. 2021. OpenPifPaf: Composite fields for semantic keypoint detection and spatio-temporal association. arXiv:2103.02440. Retrieved from https:\/\/arxiv.org\/abs\/2103.02440"},{"key":"e_1_3_2_79_2","first-page":"1097","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 25.","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25.F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), Curran Associates, Inc., 1097\u20131105."},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1090\/S0002-9939-1956-0078686-7"},{"key":"e_1_3_2_81_2","volume-title":"Proceedings of the AAAI 2023 Workshop on Representation Learning for Responsible Human-Centric AI on (R${2}$HCAI)","author":"LaChance Julienne","year":"2023","unstructured":"Julienne LaChance, William Thong, and Shruti Nagpal Alice Xiang. 2023. A case study in fairness evaluation: Current limitations and challenges for human pose estimation. In Proceedings of the AAAI 2023 Workshop on Representation Learning for Responsible Human-Centric AI on (R${2}$HCAI)."},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12984-023-01186-9"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1016\/0734-189X(85)90094-5"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","unstructured":"Chen Li and Gim Hee Lee. 2021. From synthetic to real: Unsupervised domain adaptation for animal pose estimation. arXiv:2103.14843 [cs]. DOI:10.48550\/arXiv.2103.14843","DOI":"10.48550\/arXiv.2103.14843"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","unstructured":"Jiefeng Li Can Wang Hao Zhu Yihuan Mao Hao-Shu Fang and Cewu Lu. 2019. CrowdPose: Efficient crowded scenes pose estimation and a new benchmark. arXiv:1812.00324 [cs]. DOI:10.48550\/arXiv.1812.00324","DOI":"10.48550\/arXiv.1812.00324"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","unstructured":"Junbang Liang and Ming C. Lin. 2019. Shape-aware human pose and shape reconstruction using multi-view images. arXiv:1908.09464 [cs]. DOI:10.48550\/arXiv.1908.09464","DOI":"10.48550\/arXiv.1908.09464"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","unstructured":"Jiahao Lin and Gim Hee Lee. 2021. Multi-view multi-person 3D pose estimation with plane sweep stereo. arXiv:2104.02273 [cs]. DOI:10.48550\/arXiv.2104.02273","DOI":"10.48550\/arXiv.2104.02273"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","unstructured":"Tsung-Yi Lin Michael Maire Serge Belongie Lubomir Bourdev Ross Girshick James Hays Pietro Perona Deva Ramanan C. Lawrence Zitnick and Piotr Doll\u00e1r. 2015. Microsoft COCO: Common objects in context. arXiv:1405.0312. Retrieved from https:\/\/arxiv.org\/abs\/1405.0312. DOI:10.48550\/arXiv.1405.0312","DOI":"10.48550\/arXiv.1405.0312"},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","unstructured":"Jian Liu Naveed Akhtar and Ajmal Mian. 2019. Adversarial attack on skeleton-based human action recognition. arXiv:1909.06500. Retrieved from https:\/\/arxiv.org\/abs\/1909.06500. DOI:10.48550\/arXiv.1909.06500","DOI":"10.48550\/arXiv.1909.06500"},{"key":"e_1_3_2_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00511"},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524497"},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2015.06.013"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818013"},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","unstructured":"Jonathon Luiten Georgios Kopanas Bastian Leibe and Deva Ramanan. 2023. Dynamic 3D gaussians: Tracking by persistent dynamic view synthesis. arXiv:2308.09713. Retrieved from https:\/\/arxiv.org\/abs\/2308.09713. DOI:10.48550\/arXiv.2308.09713","DOI":"10.48550\/arXiv.2308.09713"},{"key":"e_1_3_2_95_2","volume-title":"Proceedings of the International Conference on 3D Vision.","author":"Mehta Dushyant","year":"2017","unstructured":"Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3D human pose estimation in the wild using improved CNN supervision. In Proceedings of the International Conference on 3D Vision."},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.1145\/3386569.3392410"},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","unstructured":"Dushyant Mehta Oleksandr Sotnychenko Franziska Mueller Weipeng Xu Srinath Sridhar Gerard Pons-Moll and Christian Theobalt. 2018. Single-shot multi-person 3D pose estimation from monocular RGB. arXiv:1712.03453. Retrieved from https:\/\/arxiv.org\/abs\/1712.03453. DOI:10.48550\/arXiv.1712.03453","DOI":"10.48550\/arXiv.1712.03453"},{"key":"e_1_3_2_98_2","doi-asserted-by":"publisher","unstructured":"Dushyant Mehta Srinath Sridhar Oleksandr Sotnychenko Helge Rhodin Mohammad Shafiei Hans-Peter Seidel Weipeng Xu Dan Casas and Christian Theobalt. 2017. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Transactions on Graphics 36 4 (2017) 1\u201314. DOI:10.1145\/3072959.3073596","DOI":"10.1145\/3072959.3073596"},{"key":"e_1_3_2_99_2","doi-asserted-by":"publisher","unstructured":"Pierre Merriaux Yohan Dupuis R\u00e9mi Boutteau Pascal Vasseur and Xavier Savatier. 2017. A study of vicon system positioning performance. Sensors 17 7 Article 1591 (2017) 18 pages. DOI:10.3390\/s17071591","DOI":"10.3390\/s17071591"},{"key":"e_1_3_2_100_2","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1145\/3287560.3287596","volume-title":"Proceedings of the Conference on Fairness, Accountability, and Transparency.","author":"Mitchell Margaret","year":"2019","unstructured":"Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency.Association for Computing Machinery, New York, NY, USA, 220\u2013229. DOI:10.1145\/3287560.3287596"},{"key":"e_1_3_2_101_2","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.2000.0897"},{"key":"e_1_3_2_102_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2006.08.002"},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3010248"},{"key":"e_1_3_2_104_2","doi-asserted-by":"publisher","unstructured":"Alejandro Newell Zhiao Huang and Jia Deng. 2017. Associative embedding: End-to-end learning for joint detection and grouping. arXiv:1611.05424 [cs]. DOI:10.48550\/arXiv.1611.05424","DOI":"10.48550\/arXiv.1611.05424"},{"key":"e_1_3_2_105_2","doi-asserted-by":"publisher","unstructured":"Ana Filipa Rodrigues Nogueira H\u00e9lder P. Oliveira and Lu\u00eds F. Teixeira. 2024. Markerless multi-view 3D human pose estimation: A survey. arXiv:2407.03817 [cs.CV]. DOI:10.1016\/j.imavis.2025.105437","DOI":"10.1016\/j.imavis.2025.105437"},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58539-6_36"},{"key":"e_1_3_2_107_2","doi-asserted-by":"crossref","unstructured":"Daniil Osokin. 2018. Real-time 2D multi-person pose estimation on CPU: Lightweight OpenPose. arXiv:1811.12004. Retrieved from https:\/\/arxiv.org\/abs\/1811.12004","DOI":"10.5220\/0007555407440748"},{"key":"e_1_3_2_108_2","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr.2019.01123"},{"key":"e_1_3_2_109_2","doi-asserted-by":"publisher","DOI":"10.3390\/s140304189"},{"key":"e_1_3_2_110_2","doi-asserted-by":"publisher","unstructured":"Leonid Pishchulin Eldar Insafutdinov Siyu Tang Bjoern Andres Mykhaylo Andriluka Peter Gehler and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. arXiv:1511.06645 [cs]. DOI:10.48550\/arXiv.1511.06645","DOI":"10.48550\/arXiv.1511.06645"},{"key":"e_1_3_2_111_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2006.10.016"},{"key":"e_1_3_2_112_2","doi-asserted-by":"publisher","unstructured":"Haibo Qiu Chunyu Wang Jingdong Wang Naiyan Wang and Wenjun Zeng. 2019. Cross view fusion for 3D human pose estimation. arXiv:1909.01203 [cs]. DOI:10.48550\/arXiv.1909.01203","DOI":"10.48550\/arXiv.1909.01203"},{"key":"e_1_3_2_113_2","doi-asserted-by":"crossref","unstructured":"Yaadhav Raaj Haroon Idrees Gines Hidalgo and Yaser Sheikh. 2019. Efficient online multi-person 2D pose tracking with recurrent spatio-temporal affinity fields. arXiv:1811.11975. Retrieved from https:\/\/arxiv.org\/abs\/1811.11975","DOI":"10.1109\/CVPR.2019.00475"},{"key":"e_1_3_2_114_2","first-page":"II\u2013II","volume-title":"Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition","author":"Ramanan Deva","year":"2003","unstructured":"Deva Ramanan and David A. Forsyth. 2003. Finding and tracking people from the bottom up. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. II\u2013II. DOI:10.1109\/CVPR.2003.1211504"},{"key":"e_1_3_2_115_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.335"},{"key":"e_1_3_2_116_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01494"},{"key":"e_1_3_2_117_2","doi-asserted-by":"publisher","unstructured":"Joseph Redmon Santosh Divvala Ross Girshick and Ali Farhadi. 2016. You only look once: Unified real-time object detection. arXiv:1506.02640 [cs]. DOI:10.48550\/arXiv.1506.02640","DOI":"10.48550\/arXiv.1506.02640"},{"key":"e_1_3_2_118_2","doi-asserted-by":"crossref","unstructured":"Edoardo Remelli Shangchen Han Sina Honari Pascal Fua and Robert Wang. 2020. Lightweight multi-view 3D pose estimation through camera-disentangled representation. arXiv:2004.02186. Retrieved from https:\/\/arxiv.org\/abs\/2004.02186","DOI":"10.1109\/CVPR42600.2020.00608"},{"key":"e_1_3_2_119_2","first-page":"91","volume-title":"Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada). MIT Press, Cambridge, MA, USA, 91\u201399."},{"key":"e_1_3_2_120_2","volume-title":"Civilian American and European Surface Anthropometry Resource Final Report AFRL-HE- WP-TR-2002-0169","author":"Robinette Kathleen","year":"2002","unstructured":"Kathleen Robinette, Sherri Blackwell, Hein Daanen, Mark Boehmer, Scott Fleming, Tina Brill, David Hoeferlin, and Dennis Burnsides. 2002. Civilian American and European Surface Anthropometry Resource Final Report AFRL-HE- WP-TR-2002-0169. Technical Report. US Air Force Research Laboratory."},{"key":"e_1_3_2_121_2","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130883"},{"key":"e_1_3_2_122_2","first-page":"506","volume-title":"Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)","author":"Rosales R\u00f2mer","year":"2000","unstructured":"R\u00f2mer Rosales and Stan Sclaroff. 2000. Learning and synthesizing human body motion and posture. In Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580). 506\u2013511. DOI:10.1109\/AFGR.2000.840681"},{"key":"e_1_3_2_123_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2022.3145494"},{"key":"e_1_3_2_124_2","doi-asserted-by":"crossref","unstructured":"Mark Sandler Andrew Howard Menglong Zhu Andrey Zhmoginov and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. (2018). arXiv:1801.04381. Retrieved from https:\/\/arxiv.org\/abs\/1801.04381","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_125_2","doi-asserted-by":"crossref","first-page":"3674","DOI":"10.1109\/CVPR.2013.471","volume-title":"Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition","author":"Sapp Ben","year":"2013","unstructured":"Ben Sapp and Ben Taskar. 2013. MODEC: Multimodal decomposable models for human pose estimation. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Portland, OR, USA, 3674\u20133681. DOI:10.1109\/CVPR.2013.471"},{"key":"e_1_3_2_126_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2016.09.002"},{"key":"e_1_3_2_127_2","first-page":"2326","volume-title":"Proceedings of the 2018 24th International Conference on Pattern Recognition.","author":"Schwarcz Steven","year":"2018","unstructured":"Steven Schwarcz and Thomas Pollard. 2018. 3D human pose estimation from deep multi-view 2D pose. In Proceedings of the 2018 24th International Conference on Pattern Recognition.2326\u20132331. arxiv:1902.02841 [cs] DOI:10.1109\/ICPR.2018.8545631"},{"key":"e_1_3_2_128_2","doi-asserted-by":"publisher","unstructured":"Sahil Shah Naman Jain Abhishek Sharma and Arjun Jain. 2021. On the robustness of human pose estimation. arXiv:1908.06401 [cs]. DOI:10.48550\/arXiv.1908.06401","DOI":"10.48550\/arXiv.1908.06401"},{"key":"e_1_3_2_129_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01079"},{"key":"e_1_3_2_130_2","doi-asserted-by":"publisher","DOI":"10.1145\/2398356.2398381"},{"key":"e_1_3_2_131_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2022.3188716"},{"key":"e_1_3_2_132_2","doi-asserted-by":"publisher","DOI":"10.1145\/3618336"},{"key":"e_1_3_2_133_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0273-6"},{"key":"e_1_3_2_134_2","doi-asserted-by":"publisher","unstructured":"Tomas Simon Hanbyul Joo Iain Matthews and Yaser Sheikh. 2017. Hand keypoint detection in single images using multiview bootstrapping. arXiv:1704.07809 [cs]. DOI:10.48550\/arXiv.1704.07809","DOI":"10.48550\/arXiv.1704.07809"},{"key":"e_1_3_2_135_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, May 7-9, 2015.","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, May 7-9, 2015.Yoshua Bengio and Yann LeCun (Eds.), arxiv:1409.1556 [cs.CV]"},{"key":"e_1_3_2_136_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP48485.2024.10445922"},{"key":"e_1_3_2_137_2","doi-asserted-by":"publisher","DOI":"10.1145\/3132272.3134113"},{"key":"e_1_3_2_138_2","first-page":"2502","volume-title":"Proceedings of the 2024 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE","author":"Srivastav Vinkle","year":"2024","unstructured":"Vinkle Srivastav, Keqi Chen, and Nicolas Padoy. 2024. SelfPose3d: Self-supervised multi-person multi-view 3d pose estimation. In Proceedings of the 2024 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2502\u20132512. DOI:10.1109\/CVPR52733.2024.00242"},{"key":"e_1_3_2_139_2","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1109\/ICCV.2011.6126338","volume-title":"Proceedings of the 2011 International Conference on Computer Vision","author":"Stoll Carsten","year":"2011","unstructured":"Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of gaussians body model. In Proceedings of the 2011 International Conference on Computer Vision. IEEE, Barcelona, Spain, 951\u2013958. DOI:10.1109\/ICCV.2011.6126338"},{"key":"e_1_3_2_140_2","doi-asserted-by":"publisher","unstructured":"Ke Sun Bin Xiao Dong Liu and Jingdong Wang. 2019. Deep high-resolution representation learning for human pose estimation. arXiv:1902.09212 [cs]. DOI:10.48550\/arXiv.1902.09212","DOI":"10.48550\/arXiv.1902.09212"},{"key":"e_1_3_2_141_2","doi-asserted-by":"publisher","unstructured":"Yu Sun Qian Bao Wu Liu Yili Fu Michael J. Black and Tao Mei. 2021. Monocular one-stage regression of multiple 3D people. arXiv:2008.12272 [cs]. DOI:10.48550\/arXiv.2008.12272","DOI":"10.48550\/arXiv.2008.12272"},{"key":"e_1_3_2_142_2","first-page":"6105","volume-title":"Proceedings of the 36th International Conference on Machine Learning.","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning.Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), PMLR, 6105\u20136114. arxiv:1905.11946 [cs.LG] Retrieved from https:\/\/proceedings.mlr.press\/v97\/tan19a.html"},{"key":"e_1_3_2_143_2","first-page":"5026","volume-title":"Proceedings of the 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Todorov Emanuel","year":"2012","unstructured":"Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. MuJoCo: A physics engine for model-based control. In Proceedings of the 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems. 5026\u20135033. DOI:10.1109\/IROS.2012.6386109"},{"key":"e_1_3_2_144_2","doi-asserted-by":"publisher","unstructured":"Jonathan Tompson Arjun Jain Yann LeCun and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. arXiv:1406.2984 [cs]. DOI:10.48550\/arXiv.1406.2984","DOI":"10.48550\/arXiv.1406.2984"},{"key":"e_1_3_2_145_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533384"},{"key":"e_1_3_2_146_2","doi-asserted-by":"crossref","first-page":"1653","DOI":"10.1109\/CVPR.2014.214","volume-title":"Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition","author":"Toshev Alexander","year":"2014","unstructured":"Alexander Toshev and Christian Szegedy. 2014. DeepPose: Human pose estimation via deep neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1653\u20131660. arxiv:1312.4659 [cs] DOI:10.1109\/CVPR.2014.214"},{"key":"e_1_3_2_147_2","volume-title":"Proceedings of the 2017 British Machine Vision Conference","author":"Trumble Matt","year":"2017","unstructured":"Matt Trumble, Andrew Gilbert, Charles Malleson, Adrian Hilton, and John Collomosse. 2017. Total capture: 3D human pose estimation fusing video and inertial sensors. In Proceedings of the 2017 British Machine Vision Conference."},{"key":"e_1_3_2_148_2","doi-asserted-by":"publisher","unstructured":"Hanyue Tu Chunyu Wang and Wenjun Zeng. 2020. VoxelPose: Towards multi-camera 3D human pose estimation in wild environment. arXiv:2004.06239 [cs]. DOI:10.48550\/arXiv.2004.06239","DOI":"10.48550\/arXiv.2004.06239"},{"key":"e_1_3_2_149_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.492"},{"key":"e_1_3_2_150_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. arXiv:1706.03762 [cs]."},{"key":"e_1_3_2_151_2","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Marcard Timo von","year":"2018","unstructured":"Timo von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018. Recovering accurate 3d human pose in the wild using IMUs and a moving camera. In Proceedings of the European Conference on Computer Vision."},{"key":"e_1_3_2_152_2","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13131"},{"key":"e_1_3_2_153_2","unstructured":"Ronny Votel Na Li Francois Beletti Yu-Hui Chen and Ard Oerlemans. 2021. Next-generation pose detection with movenet and Tensorflow. Js. https:\/\/blog.tensorflow.org\/2021\/05\/next-generation-pose-detection-with-movenetand-tensorflowjs.html"},{"key":"e_1_3_2_154_2","doi-asserted-by":"publisher","DOI":"10.7717\/peerj.12995"},{"key":"e_1_3_2_155_2","doi-asserted-by":"publisher","unstructured":"Bastian Wandt and Bodo Rosenhahn. 2019. RepNet: Weakly supervised training of an adversarial reprojection network for 3D human pose estimation. arXiv:1902.09868 [cs]. DOI:10.48550\/arXiv.1902.09868","DOI":"10.48550\/arXiv.1902.09868"},{"key":"e_1_3_2_156_2","doi-asserted-by":"publisher","unstructured":"Chien-Yao Wang Alexey Bochkovskiy and Hong-Yuan Mark Liao. 2022. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv:2207.02696 [cs]. DOI:10.48550\/arXiv.2207.02696","DOI":"10.48550\/arXiv.2207.02696"},{"key":"e_1_3_2_157_2","doi-asserted-by":"publisher","DOI":"10.1109\/tcds.2022.3185146"},{"key":"e_1_3_2_158_2","doi-asserted-by":"publisher","unstructured":"Jiahang Wang Sheng Jin Wentao Liu Weizhong Liu Chen Qian and Ping Luo. 2021. When human pose estimation meets robustness: Adversarial algorithms and benchmarks. arXiv:2105.06152 [cs]. DOI:10.48550\/arXiv.2105.06152","DOI":"10.48550\/arXiv.2105.06152"},{"key":"e_1_3_2_159_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2021.103225"},{"key":"e_1_3_2_160_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(02)00100-0"},{"key":"e_1_3_2_161_2","unstructured":"Tao Wang Jianfeng Zhang Yujun Cai Shuicheng Yan and Jiashi Feng. 2021. Direct multi-view multi-person 3D pose estimation. arXiv:2111.04076 [cs]"},{"key":"e_1_3_2_162_2","doi-asserted-by":"publisher","unstructured":"Shih-En Wei Varun Ramakrishna Takeo Kanade and Yaser Sheikh. 2016. Convolutional pose machines. arXiv:1602.00134 [cs]. DOI:10.48550\/arXiv.1602.00134","DOI":"10.48550\/arXiv.1602.00134"},{"key":"e_1_3_2_163_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2006.07.013"},{"key":"e_1_3_2_164_2","doi-asserted-by":"publisher","unstructured":"Benjamin Wilson Judy Hoffman and Jamie Morgenstern. 2019. Predictive inequity in object detection. arXiv:1902.11097 [cs stat]. DOI:10.48550\/arXiv.1902.11097","DOI":"10.48550\/arXiv.1902.11097"},{"key":"e_1_3_2_165_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.598236"},{"key":"e_1_3_2_166_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-006-0027-7"},{"key":"e_1_3_2_167_2","first-page":"1480","volume-title":"Proceedings of the 2019 IEEE International Conference on Multimedia and Expo.","author":"Wu Jiahong","year":"2019","unstructured":"Jiahong Wu, He Zheng, Bo Zhao, Yixin Li, Baoming Yan, Rui Liang, Wenjia Wang, Shipei Zhou, Guosen Lin, Yanwei Fu, Yizhou Wang, and Yonggang Wang. 2019. AI challenger : A large-scale dataset for going deeper in image understanding. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo.1480\u20131485. arxiv:1711.06475 [cs] DOI:10.1109\/ICME.2019.00256"},{"key":"e_1_3_2_168_2","first-page":"11128","volume-title":"Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision.","author":"Wu Size","year":"2021","unstructured":"Size Wu, Sheng Jin, Wentao Liu, Lei Bai, Chen Qian, Dong Liu, and Wanli Ouyang. 2021. Graph-based 3D multi-person pose estimation using multi-view images. In Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision.IEEE, Montreal, QC, Canada, 11128\u201311137. DOI:10.1109\/ICCV48922.2021.01096"},{"key":"e_1_3_2_169_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2545669"},{"key":"e_1_3_2_170_2","doi-asserted-by":"publisher","unstructured":"Alice Xiang. 2022. Being \u2018Seen\u2019 vs. \u2018Mis-Seen\u2019: Tensions between privacy and fairness in computer vision. Harvard Journal of Law & Technology 36 1 (Feb. 2022) 1\u201361. DOI:10.2139\/ssrn.4068921","DOI":"10.2139\/ssrn.4068921"},{"key":"e_1_3_2_171_2","first-page":"6183","volume-title":"Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition.","author":"Xu Hongyi","year":"2020","unstructured":"Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William T. Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2020. GHUM and GHUML: Generative 3D human shape and articulated pose models. In Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition.IEEE, Seattle, WA, USA, 6183\u20136192. DOI:10.1109\/CVPR42600.2020.00622"},{"key":"e_1_3_2_172_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3414040"},{"key":"e_1_3_2_173_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2204.12484"},{"key":"e_1_3_2_174_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2023.3330016"},{"key":"e_1_3_2_175_2","doi-asserted-by":"crossref","first-page":"1385","DOI":"10.1109\/CVPR.2011.5995741","volume-title":"Proceedings of the CVPR 2011","author":"Yang Yi","year":"2011","unstructured":"Yi Yang and Deva Ramanan. 2011. Articulated pose estimation with flexible mixtures-of-parts. In Proceedings of the CVPR 2011. 1385\u20131392. DOI:10.1109\/CVPR.2011.5995741"},{"key":"e_1_3_2_176_2","doi-asserted-by":"publisher","unstructured":"Zhitao Yang Zhongang Cai Haiyi Mei Shuai Liu Zhaoxi Chen Weiye Xiao Yukun Wei Zhongfei Qing Chen Wei Bo Dai Wayne Wu Chen Qian Dahua Lin Ziwei Liu and Lei Yang. 2023. SynBody: Synthetic dataset with layered human models for 3D human perception and modeling. arXiv:2303.17368 [cs]. DOI:10.48550\/arXiv.2303.17368","DOI":"10.48550\/arXiv.2303.17368"},{"key":"e_1_3_2_177_2","doi-asserted-by":"publisher","unstructured":"Hang Ye Wentao Zhu Chunyu Wang Rujie Wu and Yizhou Wang. 2022. Faster VoxelPose: Real-time 3D human pose estimation by orthographic projection. arXiv:2207.10955 [cs]. DOI:10.48550\/arXiv.2207.10955","DOI":"10.48550\/arXiv.2207.10955"},{"key":"e_1_3_2_178_2","doi-asserted-by":"publisher","unstructured":"Yifei Yin Chen Guo Manuel Kaufmann Juan Jose Zarate Jie Song and Otmar Hilliges. 2023. Hi4D: 4D instance segmentation of close human interaction. arXiv:2303.15380 [cs]. DOI:10.48550\/arXiv.2303.15380","DOI":"10.48550\/arXiv.2303.15380"},{"key":"e_1_3_2_179_2","first-page":"17016","volume-title":"Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE","author":"Yin Yifei","year":"2023","unstructured":"Yifei Yin, Chen Guo, Manuel Kaufmann, Juan Jose Zarate, Jie Song, and Otmar Hilliges. 2023. Hi4D: 4D instance segmentation of close human interaction. In Proceedings of the 2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 17016\u201317027. DOI:10.1109\/cvpr52729.2023.01632"},{"key":"e_1_3_2_180_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2000.895422"},{"key":"e_1_3_2_181_2","doi-asserted-by":"publisher","unstructured":"Zhixuan Yu Linguang Zhang Yuanlu Xu Chengcheng Tang Luan Tran Cem Keskin and Hyun Soo Park. 2022. Multiview human body reconstruction from uncalibrated cameras. In Proceedings of the 36th International Conference on Neural Information Processing Systems 35 Article 572 (2022) 13 pages. DOI:10.5555\/3600270.3600842","DOI":"10.5555\/3600270.3600842"},{"key":"e_1_3_2_182_2","doi-asserted-by":"publisher","unstructured":"Jiabin Zhang Zheng Zhu Wei Zou Peng Li Yanwei Li Hu Su and Guan Huang. 2019. FastPose: Towards real-time pose estimation and tracking via scale-normalized multi-task networks. arXiv:1908.05593 [cs]. DOI:10.48550\/arXiv.1908.05593","DOI":"10.48550\/arXiv.1908.05593"},{"key":"e_1_3_2_183_2","doi-asserted-by":"crossref","unstructured":"Yuxiang Zhang Liang An Tao Yu Xiu Li Kun Li and Yebin Liu. 2020. 4D association graph for realtime multi-person motion capture using multiple video cameras. arXiv:2002.12625. Retrieved from https:\/\/arxiv.org\/abs\/2002.12625","DOI":"10.1109\/CVPR42600.2020.00140"},{"key":"e_1_3_2_184_2","doi-asserted-by":"crossref","unstructured":"Yuxiang Zhang Zhe Li Liang An Mengcheng Li Tao Yu and Yebin Liu. 2021. Lightweight multi-person total motion capture using sparse multi-view cameras. arXiv:2108.10378. Retrieved from https:\/\/arxiv.org\/abs\/2108.10378","DOI":"10.1109\/ICCV48922.2021.00551"},{"key":"e_1_3_2_185_2","doi-asserted-by":"publisher","unstructured":"Yifu Zhang Chunyu Wang Xinggang Wang Wenyu Liu and Wenjun Zeng. 2021. VoxelTrack: Multi-person 3D human pose estimation and tracking in the wild. arXiv:2108.02452 [cs]. DOI:10.48550\/arXiv.2108.02452","DOI":"10.48550\/arXiv.2108.02452"},{"key":"e_1_3_2_186_2","first-page":"2197","volume-title":"Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE","author":"Zhang Zhe","year":"2020","unstructured":"Zhe Zhang, Chunyu Wang, Wenhu Qin, and Wenjun Zeng. 2020. Fusing wearable IMUs with multi-view images for human pose estimation: A geometric approach. In Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2197\u20132206. DOI:10.1109\/cvpr42600.2020.00227"},{"key":"e_1_3_2_187_2","doi-asserted-by":"publisher","unstructured":"Ce Zheng Wenhan Wu Chen Chen Taojiannan Yang Sijie Zhu Ju Shen Nasser Kehtarnavaz and Mubarak Shah. 2020. Deep learning-based human pose estimation: A survey. arXiv:2012.13392 [cs]. DOI:10.48550\/arXiv.2012.13392","DOI":"10.48550\/arXiv.2012.13392"},{"key":"e_1_3_2_188_2","doi-asserted-by":"publisher","unstructured":"Ce Zheng Wenhan Wu Chen Chen Taojiannan Yang Sijie Zhu Ju Shen Nasser Kehtarnavaz and Mubarak Shah. 2023. Deep learning-based human pose estimation: A survey. arXiv:2012.13392 [cs]. DOI:10.48550\/arXiv.2012.13392","DOI":"10.48550\/arXiv.2012.13392"},{"key":"e_1_3_2_189_2","doi-asserted-by":"publisher","unstructured":"Yang Zheng Ruizhi Shao Yuxiang Zhang Tao Yu Zerong Zheng Qionghai Dai and Yebin Liu. 2021. DeepMultiCap: Performance capture of multiple characters using sparse multiview cameras. arXiv:2105.00261 [cs]. DOI:10.48550\/arXiv.2105.00261","DOI":"10.48550\/arXiv.2105.00261"},{"key":"e_1_3_2_190_2","volume-title":"Proceedings of the 38th AAAI Conference on Artificial Intelligence and 36th Conference on Innovative Applications of Artificial Intelligence and 14th Symposium on Educational Advances in Artificial Intelligence.","author":"Zhou Feng","year":"2024","unstructured":"Feng Zhou, Jianqin Yin, and Peiyang Li. 2024. Lifting by image - leveraging image cues for accurate 3D human pose estimation. In Proceedings of the 38th AAAI Conference on Artificial Intelligence and 36th Conference on Innovative Applications of Artificial Intelligence and 14th Symposium on Educational Advances in Artificial Intelligence.AAAI Press, Article 848, 9 pages. DOI:10.1609\/aaai.v38i7.28596"},{"key":"e_1_3_2_191_2","doi-asserted-by":"publisher","unstructured":"Xiaowei Zhou Spyridon Leonardos Xiaoyan Hu and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. arXiv:1411.2942 [cs]. DOI:10.48550\/arXiv.1411.2942","DOI":"10.48550\/arXiv.1411.2942"},{"key":"e_1_3_2_192_2","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530746"},{"key":"e_1_3_2_193_2","first-page":"1763","volume-title":"Proceedings of the 15th Asian Conference on Machine Learning.","author":"Zhuang Zonghuang","year":"2024","unstructured":"Zonghuang Zhuang and Yue Zhou. 2024. FasterVoxelPose+: Fast and accurate voxel-based 3D human pose estimation by depth-wise projection decay. In Proceedings of the 15th Asian Conference on Machine Learning.Berrin Yaniko\u011flu and Wray Buntine (Eds.), PMLR, 1763\u20131778."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3757733","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T15:20:58Z","timestamp":1759159258000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3757733"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,29]]},"references-count":192,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3757733"],"URL":"https:\/\/doi.org\/10.1145\/3757733","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"type":"print","value":"0360-0300"},{"type":"electronic","value":"1557-7341"}],"subject":[],"published":{"date-parts":[[2025,9,29]]},"assertion":[{"value":"2024-04-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}