{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T16:17:49Z","timestamp":1780762669374,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":52,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Scientific and Technological Innovation Foundation of Shunde Graduate School, USTB","award":["No. BK19CE017"],"award-info":[{"award-number":["No. BK19CE017"]}]},{"name":"Scientific and Technological Innovation of Shunde Graduate School of University of Science and Technology Beijing","award":["No. BK20AE004"],"award-info":[{"award-number":["No. BK20AE004"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3547871","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:42:35Z","timestamp":1665416555000},"page":"6174-6182","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation"],"prefix":"10.1145","author":[{"given":"Zhongwei","family":"Qiu","sequence":"first","affiliation":[{"name":"University of Science and Technology Beijing, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qiansheng","family":"Yang","sequence":"additional","affiliation":[{"name":"Baidu, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jian","family":"Wang","sequence":"additional","affiliation":[{"name":"Baidu, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dongmei","family":"Fu","sequence":"additional","affiliation":[{"name":"University of Science and Technology Beijing, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"crossref","unstructured":"Anurag Arnab Carl Doersch and Andrew Zisserman. 2019. Exploiting temporal context for 3D human pose estimation in the wild. In CVPR. 3395--3404.  Anurag Arnab Carl Doersch and Andrew Zisserman. 2019. Exploiting temporal context for 3D human pose estimation in the wild. In CVPR. 3395--3404.","DOI":"10.1109\/CVPR.2019.00351"},{"key":"e_1_3_2_2_2_1","unstructured":"Gedas Bertasius Heng Wang and Lorenzo Torresani. 2021. Is Space-Time Attention All You Need for Video Understanding?. In ICML. PMLR 813--824.  Gedas Bertasius Heng Wang and Lorenzo Torresani. 2021. Is Space-Time Attention All You Need for Video Understanding?. In ICML. PMLR 813--824."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Yujun Cai Liuhao Ge Jun Liu Jianfei Cai Tat-Jen Cham Junsong Yuan and Nadia Magnenat Thalmann. 2019. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In ICCV. 2272--2281.  Yujun Cai Liuhao Ge Jun Liu Jianfei Cai Tat-Jen Cham Junsong Yuan and Nadia Magnenat Thalmann. 2019. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In ICCV. 2272--2281.","DOI":"10.1109\/ICCV.2019.00236"},{"key":"e_1_3_2_2_4_1","first-page":"198","article-title":"Anatomy-aware 3d human pose estimation with bone-based pose decomposition","volume":"32","author":"Chen Tianlang","year":"2021","unstructured":"Tianlang Chen , Chen Fang , Xiaohui Shen , Yiheng Zhu , Zhili Chen , and Jiebo Luo . 2021 . Anatomy-aware 3d human pose estimation with bone-based pose decomposition . TCSVT 32 , 1 (2021), 198 -- 209 . Tianlang Chen, Chen Fang, Xiaohui Shen, Yiheng Zhu, Zhili Chen, and Jiebo Luo. 2021. Anatomy-aware 3d human pose estimation with bone-based pose decomposition. TCSVT 32, 1 (2021), 198--209.","journal-title":"TCSVT"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"crossref","unstructured":"Yu Cheng Bo Wang Bo Yang and Robby T Tan. 2021. Monocular 3D multiperson pose estimation by integrating top-down and bottom-up networks. In CVPR. 7649--7659.  Yu Cheng Bo Wang Bo Yang and Robby T Tan. 2021. Monocular 3D multiperson pose estimation by integrating top-down and bottom-up networks. In CVPR. 7649--7659.","DOI":"10.1109\/CVPR46437.2021.00756"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6689"},{"key":"e_1_3_2_2_7_1","volume-title":"Ju Yong Chang, and Kyoung Mu Lee","author":"Choi Hongsuk","year":"2021","unstructured":"Hongsuk Choi , Gyeongsik Moon , Ju Yong Chang, and Kyoung Mu Lee . 2021 . Beyond static features for temporally consistent 3d human pose and shape from a video. In CVPR. 1964--1973. Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. 2021. Beyond static features for temporally consistent 3d human pose and shape from a video. In CVPR. 1964--1973."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Rishabh Dabral Anurag Mundhada Uday Kusupati Safeer Afaque Abhishek Sharma and Arjun Jain. 2018. Learning 3d human pose from structure and motion. In ECCV. 668--683.  Rishabh Dabral Anurag Mundhada Uday Kusupati Safeer Afaque Abhishek Sharma and Arjun Jain. 2018. Learning 3d human pose from structure and motion. In ECCV. 668--683.","DOI":"10.1007\/978-3-030-01240-3_41"},{"key":"e_1_3_2_2_9_1","volume-title":"Sim2real transfer learning for 3d human pose estimation: motion to the rescue. NeurIPS 32","author":"Doersch Carl","year":"2019","unstructured":"Carl Doersch and Andrew Zisserman . 2019. Sim2real transfer learning for 3d human pose estimation: motion to the rescue. NeurIPS 32 ( 2019 ). Carl Doersch and Andrew Zisserman. 2019. Sim2real transfer learning for 3d human pose estimation: motion to the rescue. NeurIPS 32 (2019)."},{"key":"e_1_3_2_2_10_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly etal 2020. An Image isWorth 16x16Words: Transformers for Image Recognition at Scale. In ICLR.  Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An Image isWorth 16x16Words: Transformers for Image Recognition at Scale. In ICLR."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Matteo Fabbri Fabio Lanzi Simone Calderara Stefano Alletto and Rita Cucchiara. 2020. Compressed volumetric heatmaps for multi-person 3d pose estimation. In CVPR. 7204--7213.  Matteo Fabbri Fabio Lanzi Simone Calderara Stefano Alletto and Rita Cucchiara. 2020. Compressed volumetric heatmaps for multi-person 3d pose estimation. In CVPR. 7204--7213.","DOI":"10.1109\/CVPR42600.2020.00723"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02291478"},{"key":"e_1_3_2_2_13_1","unstructured":"Mir Rayat Imtiaz Hossain and James J Little. 2018. Exploiting temporal information for 3d human pose estimation. In ECCV. 68--84.  Mir Rayat Imtiaz Hossain and James J Little. 2018. Exploiting temporal information for 3d human pose estimation. In ECCV. 68--84."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Yupan Huang Hongwei Xue Bei Liu and Yutong Lu. 2021. Unifying multimodal transformer for bi-directional image and text generation. In ACM MM22. 1138--1147.  Yupan Huang Hongwei Xue Bei Liu and Yutong Lu. 2021. Unifying multimodal transformer for bi-directional image and text generation. In ACM MM22. 1138--1147.","DOI":"10.1145\/3474085.3481540"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2782743"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"crossref","unstructured":"Angjoo Kanazawa Jason Y Zhang Panna Felsen and Jitendra Malik. 2019. Learning 3d human dynamics from video. In CVPR. 5614--5623.  Angjoo Kanazawa Jason Y Zhang Panna Felsen and Jitendra Malik. 2019. Learning 3d human dynamics from video. In CVPR. 5614--5623.","DOI":"10.1109\/CVPR.2019.00576"},{"key":"e_1_3_2_2_18_1","volume-title":"Vibe: Video inference for human body pose and shape estimation. In CVPR. 5253--5263.","author":"Kocabas Muhammed","year":"2020","unstructured":"Muhammed Kocabas , Nikos Athanasiou , and Michael J Black . 2020 . Vibe: Video inference for human body pose and shape estimation. In CVPR. 5253--5263. Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. 2020. Vibe: Video inference for human body pose and shape estimation. In CVPR. 5253--5263."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"crossref","unstructured":"Nikos Kolotouros Georgios Pavlakos Michael J Black and Kostas Daniilidis. 2019. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In ICCV. 2252--2261.  Nikos Kolotouros Georgios Pavlakos Michael J Black and Kostas Daniilidis. 2019. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In ICCV. 2252--2261.","DOI":"10.1109\/ICCV.2019.00234"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"crossref","unstructured":"Ke Li Shijie Wang Xiang Zhang Yifan Xu Weijian Xu and Zhuowen Tu. 2021. Pose recognition with cascade transformers. In CVPR. 1944--1953.  Ke Li Shijie Wang Xiang Zhang Yifan Xu Weijian Xu and Zhuowen Tu. 2021. Pose recognition with cascade transformers. In CVPR. 1944--1953.","DOI":"10.1109\/CVPR46437.2021.00198"},{"key":"e_1_3_2_2_21_1","unstructured":"Yong-Lu Li Xinpeng Liu Han Lu Shiyi Wang Junqi Liu Jiefeng Li and Cewu Lu. 2020. Detailed 2d-3d joint representation for human-object interaction. In CVPR. 10166--10175.  Yong-Lu Li Xinpeng Liu Han Lu Shiyi Wang Junqi Liu Jiefeng Li and Cewu Lu. 2020. Detailed 2d-3d joint representation for human-object interaction. In CVPR. 10166--10175."},{"key":"e_1_3_2_2_22_1","unstructured":"Jiahao Lin and Gim Hee Lee. 2019. Trajectory space factorization for deep video-based 3d human pose estimation. In BMVC.  Jiahao Lin and Gim Hee Lee. 2019. Trajectory space factorization for deep video-based 3d human pose estimation. In BMVC."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"crossref","unstructured":"Kevin Lin Lijuan Wang and Zicheng Liu. 2021. End-to-end human pose and mesh reconstruction with transformers. In CVPR. 1954--1963.  Kevin Lin Lijuan Wang and Zicheng Liu. 2021. End-to-end human pose and mesh reconstruction with transformers. In CVPR. 1954--1963.","DOI":"10.1109\/CVPR46437.2021.00199"},{"key":"e_1_3_2_2_24_1","volume-title":"Microsoft coco: Common objects in context","author":"Lin Tsung-Yi","unstructured":"Tsung-Yi Lin , Michael Maire , Serge Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1r , and C Lawrence Zitnick . 2014. Microsoft coco: Common objects in context . In ECCV. Springer , 740--755. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755."},{"key":"e_1_3_2_2_25_1","volume-title":"A graph attention spatio-temporal convolutional network for 3D human pose estimation in video","author":"Liu Junfa","unstructured":"Junfa Liu , Juan Rojas , Yihui Li , Zhijun Liang , Yisheng Guan , Ning Xi , and Haifei Zhu . 2021. A graph attention spatio-temporal convolutional network for 3D human pose estimation in video . In ICRA. IEEE , 3374--3380. Junfa Liu, Juan Rojas, Yihui Li, Zhijun Liang, Yisheng Guan, Ning Xi, and Haifei Zhu. 2021. A graph attention spatio-temporal convolutional network for 3D human pose estimation in video. In ICRA. IEEE, 3374--3380."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"crossref","unstructured":"Ruixu Liu Ju Shen He Wang Chen Chen Sen-ching Cheung and Vijayan Asari. 2020. Attention mechanism exploits temporal contexts: Real-time 3d human pose reconstruction. In CVPR. 5064--5073.  Ruixu Liu Ju Shen He Wang Chen Chen Sen-ching Cheung and Vijayan Asari. 2020. Attention mechanism exploits temporal contexts: Real-time 3d human pose reconstruction. In CVPR. 5064--5073.","DOI":"10.1109\/CVPR42600.2020.00511"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Ziyu Liu Hongwen Zhang Zhenghao Chen Zhiyong Wang and Wanli Ouyang. 2020. Disentangling and unifying graph convolutions for skeleton-based action recognition. In CVPR. 143--152.  Ziyu Liu Hongwen Zhang Zhenghao Chen Zhiyong Wang and Wanli Ouyang. 2020. Disentangling and unifying graph convolutions for skeleton-based action recognition. In CVPR. 143--152.","DOI":"10.1109\/CVPR42600.2020.00022"},{"key":"e_1_3_2_2_28_1","unstructured":"Zhengyi Luo S Alireza Golestaneh and Kris M Kitani. 2020. 3d human motion estimation via motion compression and refinement. In ACCV.  Zhengyi Luo S Alireza Golestaneh and Kris M Kitani. 2020. 3d human motion estimation via motion compression and refinement. In ACCV."},{"key":"e_1_3_2_2_29_1","volume-title":"Tfpose: Direct human pose estimation with transformers. arXiv preprint arXiv:2103.15320","author":"Mao Weian","year":"2021","unstructured":"Weian Mao , Yongtao Ge , Chunhua Shen , Zhi Tian , Xinlong Wang , and Zhibin Wang . 2021 . Tfpose: Direct human pose estimation with transformers. arXiv preprint arXiv:2103.15320 (2021). Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, and Zhibin Wang. 2021. Tfpose: Direct human pose estimation with transformers. arXiv preprint arXiv:2103.15320 (2021)."},{"key":"e_1_3_2_2_30_1","volume-title":"Ju Yong Chang, and Kyoung Mu Lee","author":"Moon Gyeongsik","year":"2019","unstructured":"Gyeongsik Moon , Ju Yong Chang, and Kyoung Mu Lee . 2019 . Camera distanceaware top-down approach for 3d multi-person pose estimation from a single rgb image. In ICCV. 10133--10142. Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. 2019. Camera distanceaware top-down approach for 3d multi-person pose estimation from a single rgb image. In ICCV. 10133--10142."},{"key":"e_1_3_2_2_31_1","unstructured":"Xuecheng Nie Jiashi Feng Jianfeng Zhang and Shuicheng Yan. 2019. Single-stage multi-person pose machines. In ICCV. 6951--6960.  Xuecheng Nie Jiashi Feng Jianfeng Zhang and Shuicheng Yan. 2019. Single-stage multi-person pose machines. In ICCV. 6951--6960."},{"key":"e_1_3_2_2_32_1","volume-title":"Lingling Tao, Yijing Li, RobertWang, and Markus Steinberger.","author":"Parger Mathias","year":"2021","unstructured":"Mathias Parger , Chengcheng Tang , Yuanlu Xu , Christopher David Twigg , Lingling Tao, Yijing Li, RobertWang, and Markus Steinberger. 2021 . UNOC : Understanding occlusion for embodied presence in virtual reality. TVCG ( 2021). Mathias Parger, Chengcheng Tang, Yuanlu Xu, Christopher David Twigg, Lingling Tao, Yijing Li, RobertWang, and Markus Steinberger. 2021. UNOC: Understanding occlusion for embodied presence in virtual reality. TVCG (2021)."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Dario Pavllo Christoph Feichtenhofer David Grangier and Michael Auli. 2019. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In CVPR. 7753--7762.  Dario Pavllo Christoph Feichtenhofer David Grangier and Michael Auli. 2019. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In CVPR. 7753--7762.","DOI":"10.1109\/CVPR.2019.00794"},{"key":"e_1_3_2_2_34_1","volume-title":"Learning recurrent structure-guided attention network for multi-person pose estimation","author":"Qiu Zhongwei","unstructured":"Zhongwei Qiu , Kai Qiu , Jianlong Fu , and Dongmei Fu. 2019. Learning recurrent structure-guided attention network for multi-person pose estimation . In ICME. IEEE , 418--423. Zhongwei Qiu, Kai Qiu, Jianlong Fu, and Dongmei Fu. 2019. Learning recurrent structure-guided attention network for multi-person pose estimation. In ICME. IEEE, 418--423."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6867"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"crossref","unstructured":"Anurag Ranjan and Michael J Black. 2017. Optical flow estimation using a spatial pyramid network. In CVPR. 4161--4170.  Anurag Ranjan and Michael J Black. 2017. Optical flow estimation using a spatial pyramid network. In CVPR. 4161--4170.","DOI":"10.1109\/CVPR.2017.291"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"crossref","unstructured":"Yu Sun Qian Bao Wu Liu Yili Fu Michael J Black and Tao Mei. 2021. Monocular one-stage regression of multiple 3d people. In ICCV. 11179--11188.  Yu Sun Qian Bao Wu Liu Yili Fu Michael J Black and Tao Mei. 2021. Monocular one-stage regression of multiple 3d people. In ICCV. 11179--11188.","DOI":"10.1109\/ICCV48922.2021.01099"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"crossref","unstructured":"Yu Sun Yun Ye Wu Liu Wenpeng Gao Yili Fu and Tao Mei. 2019. Human mesh recovery from monocular images via a skeleton-disentangled representation. In ICCV. 5349--5358.  Yu Sun Yun Ye Wu Liu Wenpeng Gao Yili Fu and Tao Mei. 2019. Human mesh recovery from monocular images via a skeleton-disentangled representation. In ICCV. 5349--5358.","DOI":"10.1109\/ICCV.2019.00545"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"crossref","unstructured":"Timo von Marcard Roberto Henschel Michael J Black Bodo Rosenhahn and Gerard Pons-Moll. 2018. Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV. 601--617.  Timo von Marcard Roberto Henschel Michael J Black Bodo Rosenhahn and Gerard Pons-Moll. 2018. Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV. 601--617.","DOI":"10.1007\/978-3-030-01249-6_37"},{"key":"e_1_3_2_2_40_1","volume-title":"Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation","author":"Wang Can","year":"2020","unstructured":"Can Wang , Jiefeng Li , Wentao Liu , Chen Qian , and Cewu Lu . 2020 . Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation . In ECCV. Springer , 242--259. Can Wang, Jiefeng Li, Wentao Liu, Chen Qian, and Cewu Lu. 2020. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation. In ECCV. Springer, 242--259."},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2983686"},{"key":"e_1_3_2_2_42_1","volume-title":"Motion guided 3d pose estimation from videos","author":"Wang Jingbo","unstructured":"Jingbo Wang , Sijie Yan , Yuanjun Xiong , and Dahua Lin . 2020. Motion guided 3d pose estimation from videos . In ECCV. Springer , 764--780. Jingbo Wang, Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2020. Motion guided 3d pose estimation from videos. In ECCV. Springer, 764--780."},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"crossref","unstructured":"Zitian Wang Xuecheng Nie Xiaochao Qu Yunpeng Chen and Si Liu. 2022. Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation. In CVPR.  Zitian Wang Xuecheng Nie Xiaochao Qu Yunpeng Chen and Si Liu. 2022. Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation. In CVPR.","DOI":"10.1109\/CVPR52688.2022.01275"},{"key":"e_1_3_2_2_44_1","volume-title":"Transpose: Keypoint localization via transformer. In ICCV. 11802--11812.","author":"Yang Sen","year":"2021","unstructured":"Sen Yang , Zhibin Quan , Mu Nie , and Wankou Yang . 2021 . Transpose: Keypoint localization via transformer. In ICCV. 11802--11812. Sen Yang, Zhibin Quan, Mu Nie, and Wankou Yang. 2021. Transpose: Keypoint localization via transformer. In ICCV. 11802--11812."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"crossref","unstructured":"Wei Yang Wanli Ouyang Xiaolong Wang Jimmy Ren Hongsheng Li and Xiaogang Wang. 2018. 3d human pose estimation in the wild by adversarial learning. In CVPR. 5255--5264.  Wei Yang Wanli Ouyang Xiaolong Wang Jimmy Ren Hongsheng Li and Xiaogang Wang. 2018. 3d human pose estimation in the wild by adversarial learning. In CVPR. 5255--5264.","DOI":"10.1109\/CVPR.2018.00551"},{"key":"e_1_3_2_2_46_1","volume-title":"Chirality nets for human pose regression. NeurIPS 32","author":"Yeh Raymond","year":"2019","unstructured":"Raymond Yeh , Yuan-Ting Hu , and Alexander Schwing . 2019. Chirality nets for human pose regression. NeurIPS 32 ( 2019 ). Raymond Yeh, Yuan-Ting Hu, and Alexander Schwing. 2019. Chirality nets for human pose regression. NeurIPS 32 (2019)."},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"crossref","unstructured":"Andrei Zanfir Elisabeta Marinoiu and Cristian Sminchisescu. 2018. Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In CVPR. 2148--2157.  Andrei Zanfir Elisabeta Marinoiu and Cristian Sminchisescu. 2018. Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In CVPR. 2148--2157.","DOI":"10.1109\/CVPR.2018.00229"},{"key":"e_1_3_2_2_48_1","volume-title":"Deep network for the integrated 3d sensing of multiple people in natural images. NeurIPS 31","author":"Zanfir Andrei","year":"2018","unstructured":"Andrei Zanfir , Elisabeta Marinoiu , Mihai Zanfir , Alin-Ionut Popa , and Cristian Sminchisescu . 2018. Deep network for the integrated 3d sensing of multiple people in natural images. NeurIPS 31 ( 2018 ). Andrei Zanfir, Elisabeta Marinoiu, Mihai Zanfir, Alin-Ionut Popa, and Cristian Sminchisescu. 2018. Deep network for the integrated 3d sensing of multiple people in natural images. NeurIPS 31 (2018)."},{"key":"e_1_3_2_2_49_1","volume-title":"Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach","author":"Zeng Ailing","year":"2020","unstructured":"Ailing Zeng , Xiao Sun , Fuyang Huang , Minhao Liu , Qiang Xu , and Stephen Lin . 2020 . Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach . In ECCV. Springer , 507--523. Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, and Stephen Lin. 2020. Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach. In ECCV. Springer, 507--523."},{"key":"e_1_3_2_2_50_1","volume-title":"Smap: Single-shot multi-person absolute 3d pose estimation","author":"Zhen Jianan","year":"2020","unstructured":"Jianan Zhen , Qi Fang , Jiaming Sun , Wentao Liu , Wei Jiang , Hujun Bao , and Xiaowei Zhou . 2020 . Smap: Single-shot multi-person absolute 3d pose estimation . In ECCV. Springer , 550--566. Jianan Zhen, Qi Fang, Jiaming Sun, Wentao Liu, Wei Jiang, Hujun Bao, and Xiaowei Zhou. 2020. Smap: Single-shot multi-person absolute 3d pose estimation. In ECCV. Springer, 550--566."},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Ce Zheng Sijie Zhu Matias Mendieta Taojiannan Yang Chen Chen and Zhengming Ding. 2021. 3d human pose estimation with spatial and temporal transformers. In ICCV. 11656--11665.  Ce Zheng Sijie Zhu Matias Mendieta Taojiannan Yang Chen Chen and Zhengming Ding. 2021. 3d human pose estimation with spatial and temporal transformers. In ICCV. 11656--11665.","DOI":"10.1109\/ICCV48922.2021.01145"},{"key":"e_1_3_2_2_52_1","volume-title":"Objects as points. arXiv preprint arXiv:1904.07850","author":"Zhou Xingyi","year":"2019","unstructured":"Xingyi Zhou , Dequan Wang , and Philipp Kr\u00e4henb\u00fchl . 2019. Objects as points. arXiv preprint arXiv:1904.07850 ( 2019 ). Xingyi Zhou, Dequan Wang, and Philipp Kr\u00e4henb\u00fchl. 2019. Objects as points. arXiv preprint arXiv:1904.07850 (2019)."}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547871","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3547871","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:35Z","timestamp":1750186955000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547871"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":52,"alternative-id":["10.1145\/3503161.3547871","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3547871","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}