{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T00:11:50Z","timestamp":1777594310577,"version":"3.51.4"},"reference-count":180,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,11,21]],"date-time":"2022-11-21T00:00:00Z","timestamp":1668988800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2020YFB2104100"],"award-info":[{"award-number":["2020YFB2104100"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62172421 and 62072459"],"award-info":[{"award-number":["62172421 and 62072459"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2023,4,30]]},"abstract":"<jats:p>Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, survey study about the latest development of deep learning-based methods is lacking. Therefore, this study presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this study is limited to methods taking monocular RGB\/RGBD data as input and covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods of both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.<\/jats:p>","DOI":"10.1145\/3524496","type":"journal-article","created":{"date-parts":[[2022,3,31]],"date-time":"2022-03-31T12:01:16Z","timestamp":1648728076000},"page":"1-40","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":89,"title":["Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview"],"prefix":"10.1145","volume":"55","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6324-1712","authenticated-orcid":false,"given":"Zhaoxin","family":"Fan","sequence":"first","affiliation":[{"name":"Key Laboratory of Data Engineering and Knowledge Engineering of MOE, School of Information, Renmin University of China, Beijing, China, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yazhi","family":"Zhu","sequence":"additional","affiliation":[{"name":"Institute of Information Science, Beijing Jiaotong University, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yulin","family":"He","sequence":"additional","affiliation":[{"name":"Key Laboratory of Data Engineering and Knowledge Engineering of MOE, School of Information, Renmin University of China, Beijing, China, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qi","family":"Sun","sequence":"additional","affiliation":[{"name":"Key Laboratory of Data Engineering and Knowledge Engineering of MOE, School of Information, Renmin University of China, Beijing, China, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongyan","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Economics and Management, Tsinghua University, Beijing, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"He","sequence":"additional","affiliation":[{"name":"Key Laboratory of Data Engineering and Knowledge Engineering of MOE, School of Information, Renmin University of China, Beijing, China, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,11,21]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"Instant 3D object tracking with applications in augmented reality","author":"Ahmadyan Adel","year":"2020","unstructured":"Adel Ahmadyan, Tingbo Hou, Jianing Wei, Liangkai Zhang, Artsiom Ablavatski, and Matthias Grundmann. 2020. Instant 3D object tracking with applications in augmented reality. arXiv preprint arXiv:2006.13194 (2020).","journal-title":"arXiv preprint arXiv:2006.13194"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00773"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2019.2892405"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.416"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8460875"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10605-2_35"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.366"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00938"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58592-1_9"},{"key":"e_1_3_1_11_2","article-title":"EfficientPose\u2013An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach","author":"Bukschat Yannick","year":"2020","unstructured":"Yannick Bukschat and Marcus Vetter. 2020. EfficientPose\u2013An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach. arXiv preprint arXiv:2011.04307 (2020).","journal-title":"arXiv preprint arXiv:2011.04307"},{"key":"e_1_3_1_12_2","article-title":"I like to move it: 6D pose estimation as an action decision process","author":"Busam Benjamin","year":"2020","unstructured":"Benjamin Busam, Hyun Jun Jung, and Nassir Navab. 2020. I like to move it: 6D pose estimation as an action decision process. arXiv preprint arXiv:2009.12678 (2020).","journal-title":"arXiv preprint arXiv:2009.12678"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICAR.2015.7251504"},{"key":"e_1_3_1_15_2","article-title":"ShapeNet: An information-rich 3D model repository","author":"Chang Angel X.","year":"2015","unstructured":"Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et\u00a0al. 2015. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015).","journal-title":"arXiv preprint arXiv:1512.03012"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00812"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01199"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.5555\/3294771.3294842"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.23919\/CCC50068.2020.9189304"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00277"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093272"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00429"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00163"},{"key":"e_1_3_1_24_2","first-page":"9609","article-title":"Learning to predict 3D objects with an interpolation-based differentiable renderer","volume":"32","author":"Chen Wenzheng","year":"2019","unstructured":"Wenzheng Chen, Huan Ling, Jun Gao, Edward Smith, Jaakko Lehtinen, Alec Jacobson, and Sanja Fidler. 2019. Learning to predict 3D objects with an interpolation-based differentiable renderer. Adv. Neural Inf. Process. Syst. 32 (2019), 9609\u20139619.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3034386"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58574-7_9"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.236"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2018.02086"},{"key":"e_1_3_1_29_2","article-title":"PoseRBPF: A Rao\u2013Blackwellized particle filter for 6-D object pose tracking","author":"Deng Xinke","year":"2021","unstructured":"Xinke Deng, Arsalan Mousavian, Yu Xiang, Fei Xia, Timothy Bretl, and Dieter Fox. 2021. PoseRBPF: A Rao\u2013Blackwellized particle filter for 6-D object pose tracking. IEEE Trans. Robot. 37, 5 (2021), 1328\u20131342.","journal-title":"IEEE Trans. Robot."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9196714"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01169"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.316"},{"key":"e_1_3_1_33_2","article-title":"Vision-based robotic grasping from object localization pose estimation grasp detection to motion planning: A review","author":"Du Guoguang","year":"2019","unstructured":"Guoguang Du, Kai Wang, and Shiguo Lian. 2019. Vision-based robotic grasping from object localization pose estimation grasp detection to motion planning: A review. arXiv preprint arXiv:1905.06658 (2019).","journal-title":"arXiv preprint arXiv:1905.06658"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00667"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_4"},{"key":"e_1_3_1_36_2","article-title":"ACR-Pose: Adversarial canonical representation reconstruction network for category level 6D object pose estimation","author":"Fan Zhaoxin","year":"2021","unstructured":"Zhaoxin Fan, Zhengbo Song, Jian Xu, Zhicheng Wang, Kejian Wu, Hongyan Liu, and Jun He. 2021. ACR-Pose: Adversarial canonical representation reconstruction network for category level 6D object pose estimation. arXiv preprint arXiv:2111.10524 (2021).","journal-title":"arXiv preprint arXiv:2111.10524"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2020.11.002"},{"key":"e_1_3_1_38_2","article-title":"CloudAAE: Learning 6D object pose regression with on-line data synthesis on point clouds","author":"Gao Ge","year":"2021","unstructured":"Ge Gao, Mikko Lauri, Xiaolin Hu, Jianwei Zhang, and Simone Frintrop. 2021. CloudAAE: Learning 6D object pose regression with on-line data synthesis on point clouds. arXiv preprint arXiv:2103.01977 (2021).","journal-title":"arXiv preprint arXiv:2103.01977"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9197461"},{"key":"e_1_3_1_40_2","article-title":"Monocular 3D object detection with sequential feature association and depth hint augmentation","author":"Gao Tianze","year":"2020","unstructured":"Tianze Gao, Huihui Pan, and Huijun Gao. 2020. Monocular 3D object detection with sequential feature association and depth hint augmentation. arXiv preprint arXiv:2011.14589 (2020).","journal-title":"arXiv preprint arXiv:2011.14589"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2017.2734599"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2018.10.001"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21918"},{"key":"e_1_3_1_45_2","article-title":"Deep learning for 3D point clouds: A survey","author":"Guo Yulan","year":"2020","unstructured":"Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. 2020. Deep learning for 3D point clouds: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 12 (2020), 4338\u20134364.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00302"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01165"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.206"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126326"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33885-4_60"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01172"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2017.103"},{"key":"e_1_3_1_56_2","first-page":"606","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Hoda\u0148 Tom\u00e1\u0161","year":"2016","unstructured":"Tom\u00e1\u0161 Hoda\u0148, Ji\u0159\u00ed Matas, and \u0160t\u011bp\u00e1n Obdr\u017e\u00e1lek. 2016. On evaluation of 6D object pose estimation. In Proceedings of the European Conference on Computer Vision. Springer, 606\u2013619."},{"key":"e_1_3_1_57_2","article-title":"MobilePose: Real-time pose estimation for unseen objects with weak shape supervision","author":"Hou Tingbo","year":"2020","unstructured":"Tingbo Hou, Adel Ahmadyan, Liangkai Zhang, Jianing Wei, and Matthias Grundmann. 2020. MobilePose: Real-time pose estimation for unseen objects with weak shape supervision. arXiv preprint arXiv:2003.03522 (2020).","journal-title":"arXiv preprint arXiv:2003.03522"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00549"},{"key":"e_1_3_1_59_2","article-title":"Monocular quasi-dense 3D object tracking","author":"Hu Hou-Ning","year":"2021","unstructured":"Hou-Ning Hu, Yung-Hsu Yang, Tobias Fischer, Trevor Darrell, Fisher Yu, and Min Sun. 2021. Monocular quasi-dense 3D object tracking. arXiv preprint arXiv:2103.07351 (2021).","journal-title":"arXiv preprint arXiv:2103.07351"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00300"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00350"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00141"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.232073"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compedu.2018.05.002"},{"key":"e_1_3_1_65_2","first-page":"477","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Jafari Omid Hosseini","year":"2018","unstructured":"Omid Hosseini Jafari, Siva Karthik Mustikovela, Karl Pertsch, Eric Brachmann, and Carsten Rother. 2018. iPose: Instance-aware 6D pose estimation of partly occluded objects. In Proceedings of the Asian Conference on Computer Vision. Springer, 477\u2013492."},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01291"},{"key":"e_1_3_1_67_2","article-title":"Monocular 3D object detection and box fitting trained end-to-end using intersection-over-union loss","author":"J\u00f6rgensen Eskil","year":"2019","unstructured":"Eskil J\u00f6rgensen, Christopher Zach, and Fredrik Kahl. 2019. Monocular 3D object detection and box fitting trained end-to-end using intersection-over-union loss. arXiv preprint arXiv:1906.08070 (2019).","journal-title":"arXiv preprint arXiv:1906.08070"},{"key":"e_1_3_1_68_2","doi-asserted-by":"crossref","unstructured":"Rudolph Emil Kalman. 1960. A new approach to linear filtering and prediction problems. 35\u201345.","DOI":"10.1115\/1.3662552"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00338"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.169"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46487-9_13"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1139"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1007\/s43154-020-00021-6"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.115"},{"key":"e_1_3_1_75_2","first-page":"384","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Krull Alexander","year":"2014","unstructured":"Alexander Krull, Frank Michel, Eric Brachmann, Stefan Gumhold, Stephan Ihrke, and Carsten Rother. 2014. 6-DoF model based tracking via object coordinate regression. In Proceedings of the Asian Conference on Computer Vision. Springer, 384\u2013399."},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1002\/nav.3800020109"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3110538"},{"key":"e_1_3_1_78_2","article-title":"Motion-Nets: 6D tracking of unknown objects in unseen environments using RGB","author":"Leeb Felix","year":"2019","unstructured":"Felix Leeb, Arunkumar Byravan, and Dieter Fox. 2019. Motion-Nets: 6D tracking of unknown objects in unseen environments using RGB. arXiv preprint arXiv:1910.13942 (2019).","journal-title":"arXiv preprint arXiv:1910.13942"},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-008-0152-6"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/IVS.2011.5940562"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00111"},{"key":"e_1_3_1_82_2","article-title":"Monocular 3D detection with geometric constraint embedding and semi-supervised training","author":"Li Peixuan","year":"2021","unstructured":"Peixuan Li and Zhao Huaici. 2021. Monocular 3D detection with geometric constraint embedding and semi-supervised training. IEEE Robot. Automat. Lett. 6, 3 (2021), 5565\u20135572.","journal-title":"IEEE Robot. Automat. Lett."},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58580-8_38"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00376"},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_42"},{"key":"e_1_3_1_86_2","article-title":"Robust RGB-based 6-DoF pose estimation without real pose annotations","author":"Li Zhigang","year":"2020","unstructured":"Zhigang Li, Yinlin Hu, Mathieu Salzmann, and Xiangyang Ji. 2020. Robust RGB-based 6-DoF pose estimation without real pose annotations. arXiv preprint arXiv:2008.08391 (2020).","journal-title":"arXiv preprint arXiv:2008.08391"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00777"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00354"},{"key":"e_1_3_1_89_2","article-title":"Single-stage keypoint-based category-level object pose estimation from an RGB image","author":"Lin Yunzhi","year":"2021","unstructured":"Yunzhi Lin, Jonathan Tremblay, Stephen Tyree, Patricio A. Vela, and Stan Birchfield. 2021. Single-stage keypoint-based category-level object pose estimation from an RGB image. arXiv preprint arXiv:2109.06161 (2021).","journal-title":"arXiv preprint arXiv:2109.06161"},{"key":"e_1_3_1_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00115"},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_32"},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_3_1_93_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3052442"},{"key":"e_1_3_1_94_2","article-title":"Swin transformer: Hierarchical vision transformer using shifted windows","author":"Liu Ze","year":"2021","unstructured":"Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the International Conference on Computer Vision (ICCV).","journal-title":"I"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00506"},{"key":"e_1_3_1_96_2","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_3_1_97_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58601-0_19"},{"key":"e_1_3_1_98_2","unstructured":"F. Manhardt G. Wang B. Busam et\u00a0al. 2020. CPS++: Improving class-level 6D pose and shape estimation from monocular images with self-supervised learning[J]. arXiv preprint arXiv:2003.0584."},{"key":"e_1_3_1_99_2","first-page":"690","volume-title":"VISIGRAPP (5: VISAPP)","author":"Majcher Mateusz","year":"2020","unstructured":"Mateusz Majcher and Bogdan Kwolek. 2020. 3D model-based 6D object pose tracking on RGB images using particle filtering and heuristic optimization. In VISIGRAPP (5: VISAPP). 690\u2013697."},{"key":"e_1_3_1_100_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00217"},{"key":"e_1_3_1_101_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_49"},{"key":"e_1_3_1_102_2","article-title":"CPS++: Improving class-level 6D pose and shape estimation from monocular images with self-supervised learning","author":"Manhardt Fabian","year":"2020","unstructured":"Fabian Manhardt, Gu Wang, Benjamin Busam, Manuel Nickel, Sven Meier, Luca Minciullo, Xiangyang Ji, and Nassir Navab. 2020. CPS++: Improving class-level 6D pose and shape estimation from monocular images with self-supervised learning. arXiv e-prints (2020).","journal-title":"arXiv e-prints"},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-66096-3_45"},{"key":"e_1_3_1_104_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21735-7_7"},{"key":"e_1_3_1_105_2","volume-title":"Autonomous Driving: Technical, Legal and Social Aspects","author":"Maurer Markus","year":"2016","unstructured":"Markus Maurer, J. Christian Gerdes, Barbara Lenz, and Hermann Winner. 2016. Autonomous Driving: Technical, Legal and Social Aspects. Springer Nature."},{"key":"e_1_3_1_106_2","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2018.XIV.021"},{"key":"e_1_3_1_107_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.597"},{"key":"e_1_3_1_108_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-49409-8_18"},{"key":"e_1_3_1_109_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2858446"},{"key":"e_1_3_1_110_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_8"},{"key":"e_1_3_1_111_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00313"},{"key":"e_1_3_1_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794448"},{"key":"e_1_3_1_113_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00776"},{"key":"e_1_3_1_114_2","doi-asserted-by":"publisher","DOI":"10.1051\/matecconf\/201927702029"},{"key":"e_1_3_1_115_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989233"},{"key":"e_1_3_1_116_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-54502-8"},{"key":"e_1_3_1_117_2","article-title":"OCM3D: Object-centric monocular 3D object detection","author":"Peng Liang","year":"2021","unstructured":"Liang Peng, Fei Liu, Senbo Yan, Xiaofei He, and Deng Cai. 2021. OCM3D: Object-centric monocular 3D object detection. arXiv preprint arXiv:2104.06041 (2021).","journal-title":"arXiv preprint arXiv:2104.06041"},{"key":"e_1_3_1_118_2","article-title":"Lidar point cloud guided monocular 3D object detection","author":"Peng Liang","year":"2021","unstructured":"Liang Peng, Fei Liu, Zhengxu Yu, Senbo Yan, Dan Deng, and Deng Cai. 2021. Lidar point cloud guided monocular 3D object detection. arXiv preprint arXiv:2104.09035 (2021).","journal-title":"arXiv preprint arXiv:2104.09035"},{"key":"e_1_3_1_119_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00469"},{"key":"e_1_3_1_120_2","first-page":"652","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Charles R.","year":"2017","unstructured":"Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652\u2013660."},{"key":"e_1_3_1_121_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00592"},{"key":"e_1_3_1_122_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018851"},{"key":"e_1_3_1_123_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.413"},{"key":"e_1_3_1_124_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00845"},{"key":"e_1_3_1_125_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_1_126_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_3_1_127_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-28603-3_11"},{"key":"e_1_3_1_128_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2020.103898"},{"key":"e_1_3_1_129_2","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV) Workshops","author":"Sahin Caner","year":"2018","unstructured":"Caner Sahin and Tae-Kyun Kim. 2018. Category-level 6D object pose recovery in depth images. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops."},{"key":"e_1_3_1_130_2","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV) Workshops","author":"Sahin Caner","year":"2018","unstructured":"Caner Sahin and Tae-Kyun Kim. 2018. Recovering 6D object pose: A review and multi-modal analysis. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops."},{"key":"e_1_3_1_131_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_132_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58526-6_6"},{"key":"e_1_3_1_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.377"},{"key":"e_1_3_1_134_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2017.8296851"},{"key":"e_1_3_1_135_2","article-title":"Demystifying Pseudo-LiDAR for monocular 3D object detection","author":"Simonelli Andrea","year":"2020","unstructured":"Andrea Simonelli, Samuel Rota Bul\u00f2, Lorenzo Porzi, Peter Kontschieder, and Elisa Ricci. 2020. Demystifying Pseudo-LiDAR for monocular 3D object detection. arXiv preprint arXiv:2012.05796 (2020).","journal-title":"arXiv preprint arXiv:2012.05796"},{"key":"e_1_3_1_136_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00208"},{"key":"e_1_3_1_137_2","doi-asserted-by":"publisher","DOI":"10.1109\/3DV50981.2020.00039"},{"key":"e_1_3_1_138_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00051"},{"key":"e_1_3_1_139_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45404-7_20"},{"key":"e_1_3_1_140_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_43"},{"key":"e_1_3_1_141_2","first-page":"6105","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 6105\u20136114."},{"key":"e_1_3_1_142_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"e_1_3_1_143_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00038"},{"key":"e_1_3_1_144_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58589-1_32"},{"key":"e_1_3_1_145_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8202133"},{"key":"e_1_3_1_146_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00243"},{"key":"e_1_3_1_147_2","volume-title":"Proceedings of the Conference on Robot Learning (CoRL)","author":"Tremblay Jonathan","year":"2018","unstructured":"Jonathan Tremblay, Thang To, Balakumar Sundaralingam, Yu Xiang, Dieter Fox, and Stan Birchfield. 2018. Deep object pose estimation for semantic robotic grasping of household objects. In Proceedings of the Conference on Robot Learning (CoRL)."},{"issue":"04","key":"e_1_3_1_148_2","first-page":"376","article-title":"Least-squares estimation of transformation parameters between two point patterns","volume":"13","author":"Umeyama Shinji","year":"1991","unstructured":"Shinji Umeyama. 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Comput. Archit. Lett. 13, 04 (1991), 376\u2013380.","journal-title":"IEEE Comput. Archit. Lett."},{"key":"e_1_3_1_149_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01455"},{"key":"e_1_3_1_150_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9196679"},{"key":"e_1_3_1_151_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00346"},{"key":"e_1_3_1_152_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_7"},{"key":"e_1_3_1_153_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01634"},{"key":"e_1_3_1_154_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00275"},{"key":"e_1_3_1_155_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2018.2888904"},{"key":"e_1_3_1_156_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW54120.2021.00107"},{"key":"e_1_3_1_157_2","article-title":"Probabilistic and geometric depth: Detecting objects in perspective","author":"Wang Tai","year":"2021","unstructured":"Tai Wang, Xinge Zhu, Jiangmiao Pang, and Dahua Lin. 2021. Probabilistic and geometric depth: Detecting objects in perspective. arXiv preprint arXiv:2107.14160 (2021).","journal-title":"arXiv preprint arXiv:2107.14160"},{"key":"e_1_3_1_158_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"e_1_3_1_159_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00864"},{"key":"e_1_3_1_160_2","article-title":"Instant motion tracking and its applications to augmented reality","author":"Wei Jianing","year":"2019","unstructured":"Jianing Wei, Genzhi Ye, Tyler Mullen, Matthias Grundmann, Adel Ahmadyan, and Tingbo Hou. 2019. Instant motion tracking and its applications to augmented reality. arXiv preprint arXiv:1907.06796 (2019).","journal-title":"arXiv preprint arXiv:1907.06796"},{"key":"e_1_3_1_161_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341314"},{"key":"e_1_3_1_162_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2019.00114"},{"key":"e_1_3_1_163_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341164"},{"key":"e_1_3_1_164_2","article-title":"Joint 3D tracking and forecasting with graph neural network and diversity sampling","author":"Weng Xinshuo","year":"2020","unstructured":"Xinshuo Weng, Ye Yuan, and Kris Kitani. 2020. Joint 3D tracking and forecasting with graph neural network and diversity sampling. arXiv preprint arXiv:2003.07847 (2020).","journal-title":"arXiv preprint arXiv:2003.07847"},{"key":"e_1_3_1_165_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298930"},{"key":"e_1_3_1_166_2","article-title":"Vote from the center: 6 DoF pose estimation in RGB-D images by radial keypoint voting","author":"Wu Yangzheng","year":"2021","unstructured":"Yangzheng Wu, Mohsen Zand, Ali Etemad, and Michael Greenspan. 2021. Vote from the center: 6 DoF pose estimation in RGB-D images by radial keypoint voting. arXiv preprint arXiv:2104.02527 (2021).","journal-title":"arXiv preprint arXiv:2104.02527"},{"key":"e_1_3_1_167_2","article-title":"PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes","author":"Xiang Yu","year":"2017","unstructured":"Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. 2017. PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017).","journal-title":"arXiv preprint arXiv:1711.00199"},{"key":"e_1_3_1_168_2","first-page":"802","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems","author":"Shi Xingjian","year":"2015","unstructured":"Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 802\u2013810."},{"key":"e_1_3_1_169_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00390"},{"key":"e_1_3_1_170_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58545-7_2"},{"key":"e_1_3_1_171_2","article-title":"iNeRF: Inverting neural radiance fields for pose estimation","author":"Yen-Chen Lin","year":"2020","unstructured":"Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Alberto Rodriguez, Phillip Isola, and Tsung-Yi Lin. 2020. iNeRF: Inverting neural radiance fields for pose estimation. arXiv preprint arXiv:2012.05877 (2020).","journal-title":"arXiv preprint arXiv:2012.05877"},{"key":"e_1_3_1_172_2","article-title":"Pseudo-LiDAR++: Accurate depth for 3D object detection in autonomous driving","author":"You Yurong","year":"2020","unstructured":"Yurong You, Yan Wang, Wei-Lun Chao, Divyansh Garg, Geoff Pleiss, Bharath Hariharan, Mark Campbell, and Kilian Q. Weinberger. 2020. Pseudo-LiDAR++: Accurate depth for 3D object detection in autonomous driving. Proceedings of the Conference on International Conference on Learning Representations (ICLR).","journal-title":"Proceedings of the Conference on International Conference on Learning Representations (ICLR)"},{"key":"e_1_3_1_173_2","article-title":"6DoF object pose estimation via differentiable proxy voting loss","author":"Yu Xin","year":"2020","unstructured":"Xin Yu, Zheyu Zhuang, Piotr Koniusz, and Hongdong Li. 2020. 6DoF object pose estimation via differentiable proxy voting loss. In Proceedings of the British Machine Vision Conference (BMVC).","journal-title":"I"},{"key":"e_1_3_1_174_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8202207"},{"key":"e_1_3_1_175_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00203"},{"key":"e_1_3_1_176_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00716"},{"key":"e_1_3_1_177_2","article-title":"Estimating 6D pose from localizing designated surface keypoints","author":"Zhao Zelin","year":"2018","unstructured":"Zelin Zhao, Gao Peng, Haoyu Wang, Hao-Shu Fang, Chengkun Li, and Cewu Lu. 2018. Estimating 6D pose from localizing designated surface keypoints. arXiv preprint arXiv:1812.01387 (2018).","journal-title":"arXiv preprint arXiv:1812.01387"},{"key":"e_1_3_1_178_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3003866"},{"key":"e_1_3_1_179_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58548-8_28"},{"key":"e_1_3_1_180_2","first-page":"11503","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Zhou Xichuan","year":"2020","unstructured":"Xichuan Zhou, Yicong Peng, Chunqiao Long, Fengbo Ren, and Cong Shi. 2020. MoNet3D: Towards accurate monocular 3D object localization in real time. In Proceedings of the International Conference on Machine Learning. PMLR, 11503\u201311512."},{"key":"e_1_3_1_181_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00093"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524496","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524496","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:52Z","timestamp":1750183792000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524496"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,21]]},"references-count":180,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,4,30]]}},"alternative-id":["10.1145\/3524496"],"URL":"https:\/\/doi.org\/10.1145\/3524496","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,21]]},"assertion":[{"value":"2021-06-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-08","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}