{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,6]],"date-time":"2025-10-06T18:39:05Z","timestamp":1759775945864,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,1,7]],"date-time":"2022-01-07T00:00:00Z","timestamp":1641513600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,1,7]]},"DOI":"10.1145\/3512388.3512432","type":"proceedings-article","created":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T02:28:13Z","timestamp":1648520893000},"page":"303-309","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["RMTrack: 6D Object Pose Tracking by Continuous Image Render Match"],"prefix":"10.1145","author":[{"given":"Enyuan","family":"Cao","sequence":"first","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, School of Artificial Intelligence, China"}]},{"given":"Xiaoyang","family":"Zhu","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, China"}]},{"given":"Haitao","family":"Yu","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, China"}]},{"given":"Yongshi","family":"Jiang","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, China"}]}],"member":"320","published-online":{"date-parts":[[2022,3,28]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3356250.3360044"},{"key":"e_1_3_2_1_2_1","volume-title":"MAC Annual Meeting Presentations","volume":"2021","author":"Barsan Erin","year":"2021","unstructured":"Erin Barsan and Hilary Wang . Dpoe-n : Providing professional development funding for digital preservation . In MAC Annual Meeting Presentations , volume 2021 . Iowa State University Digital Press , 2021 . Erin Barsan and Hilary Wang. Dpoe-n: Providing professional development funding for digital preservation. In MAC Annual Meeting Presentations, volume 2021. Iowa State University Digital Press, 2021."},{"key":"e_1_3_2_1_3_1","first-page":"551","volume-title":"European conference on computer vision","author":"Brachmann Eric","unstructured":"Eric Brachmann , Alexander Krull , Frank Michel , Stefan Gumhold , Jamie Shotton , and Carsten Rother . Learning 6d object pose estimation using 3d object coordinates . In European conference on computer vision , pages 536\u2013 551 . Springer, 2014 Eric Brachmann, Alexander Krull, Frank Michel, Stefan Gumhold, Jamie Shotton, and Carsten Rother. Learning 6d object pose estimation using 3d object coordinates. In European conference on computer vision, pages 536\u2013551. Springer, 2014"},{"key":"e_1_3_2_1_4_1","first-page":"2448","volume-title":"2016 IEEE International conference on Robotics and Automation (ICRA)","author":"Cao Zhe","unstructured":"Zhe Cao , Yaser Sheikh , and Natasha Kholgade Banerjee . Real-time scalable 6dof pose estimation for textureless objects . In 2016 IEEE International conference on Robotics and Automation (ICRA) , pages 2441\u2013 2448 . IEEE, 2016. Zhe Cao, Yaser Sheikh, and Natasha Kholgade Banerjee. Real-time scalable 6dof pose estimation for textureless objects. In 2016 IEEE International conference on Robotics and Automation (ICRA), pages 2441\u20132448. IEEE, 2016."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_3_2_1_6_1","volume-title":"An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929","author":"Dosovitskiy Alexey","year":"2020","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 , 2020 . Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.316"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01165"},{"key":"e_1_3_2_1_9_1","volume-title":"Gradient response maps for real-time detection of textureless objects","author":"Hinterstoisser Stefan","year":"2011","unstructured":"Stefan Hinterstoisser , Cedric Cagniart , Slobodan Ilic , Peter Sturm , Nassir Navab , Pascal Fua , and Vincent Lepetit . Gradient response maps for real-time detection of textureless objects . IEEE transactions on pattern analysis and machine intelligence, 34(5):876\u2013888, 2011 . Stefan Hinterstoisser, Cedric Cagniart, Slobodan Ilic, Peter Sturm, Nassir Navab, Pascal Fua, and Vincent Lepetit. Gradient response maps for real-time detection of textureless objects. IEEE transactions on pattern analysis and machine intelligence, 34(5):876\u2013888, 2011."},{"key":"e_1_3_2_1_10_1","first-page":"562","volume-title":"Asian conference on computer vision","author":"Hinterstoisser Stefan","unstructured":"Stefan Hinterstoisser , Vincent Lepetit , Slobodan Ilic , Stefan Holzer , Gary Bradski , Kurt Konolige , and Nassir Navab . Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes . In Asian conference on computer vision , pages 548\u2013 562 . Springer, 2012. Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holzer, Gary Bradski, Kurt Konolige, and Nassir Navab. Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision, pages 548\u2013562. Springer, 2012."},{"key":"e_1_3_2_1_11_1","volume-title":"Repose: Real-time iterative rendering and refinement for 6d object pose estimation. arXiv preprint arXiv:2104.00633","author":"Iwase Shun","year":"2021","unstructured":"Shun Iwase , Xingyu Liu , Rawal Khirodkar , Rio Yokota , and Kris M Kitani . Repose: Real-time iterative rendering and refinement for 6d object pose estimation. arXiv preprint arXiv:2104.00633 , 2021 . Shun Iwase, Xingyu Liu, Rawal Khirodkar, Rio Yokota, and Kris M Kitani. Repose: Real-time iterative rendering and refinement for 6d object pose estimation. arXiv preprint arXiv:2104.00633, 2021."},{"key":"e_1_3_2_1_12_1","first-page":"519","volume-title":"Computer Graphics Forum","author":"Koulieris George Alex","unstructured":"George Alex Koulieris , Kaan Ak\u015fit , Michael Stengel , Rafa\u0142 K Mantiuk , Katerina Mania , and Christian Richardt . Near-eye display and tracking technologies for virtual and augmented reality . In Computer Graphics Forum , volume 38 , pages 493\u2013 519 . Wiley Online Library, 2019. George Alex Koulieris, Kaan Ak\u015fit, Michael Stengel, Rafa\u0142 K Mantiuk, Katerina Mania, and Christian Richardt. Near-eye display and tracking technologies for virtual and augmented reality. In Computer Graphics Forum, volume 38, pages 493\u2013519. Wiley Online Library, 2019."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_42"},{"key":"e_1_3_2_1_14_1","first-page":"1157","volume-title":"Proceedings of the seventh IEEE international conference on computer vision","volume":"2","author":"Lowe David G","unstructured":"David G Lowe . Object recognition from local scale-invariant features . In Proceedings of the seventh IEEE international conference on computer vision , volume 2 , pages 1150\u2013 1157 . Ieee, 1999. David G Lowe. Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, volume 2, pages 1150\u20131157. Ieee, 1999."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_49"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.540"},{"key":"e_1_3_2_1_17_1","first-page":"2018","volume-title":"2017 IEEE international conference on robotics and automation (ICRA)","author":"Pavlakos Georgios","unstructured":"Georgios Pavlakos , Xiaowei Zhou , Aaron Chan , Konstantinos G Derpanis , and Kostas Daniilidis . 6-dof object pose from semantic keypoints . In 2017 IEEE international conference on robotics and automation (ICRA) , pages 2011\u2013 2018 . IEEE, 2017. Georgios Pavlakos, Xiaowei Zhou, Aaron Chan, Konstantinos G Derpanis, and Kostas Daniilidis. 6-dof object pose from semantic keypoints. In 2017 IEEE international conference on robotics and automation (ICRA), pages 2011\u20132018. IEEE, 2017."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00469"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.413"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-005-3674-1"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350984"},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Stoiber Manuel","year":"2020","unstructured":"Manuel Stoiber , Martin Pfanne , Klaus H Strobl , Rudolph Triebel , and Alin Albu-Sch\u00e4ffer . A sparse gaussian approach to region-based 6dof object tracking . In Proceedings of the Asian Conference on Computer Vision , 2020 . Manuel Stoiber, Martin Pfanne, Klaus H Strobl, Rudolph Triebel, and Alin Albu-Sch\u00e4ffer. A sparse gaussian approach to region-based 6dof object tracking. In Proceedings of the Asian Conference on Computer Vision, 2020."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2950449"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.23"},{"key":"e_1_3_2_1_25_1","volume-title":"A region-based gauss-newton approach to real-time monocular multiple object tracking","author":"Tjaden Henning","year":"1812","unstructured":"Henning Tjaden , Ulrich Schwanecke , Elmar Sch\u00f6mer , and Daniel Cremers . A region-based gauss-newton approach to real-time monocular multiple object tracking . IEEE transactions on pattern analysis and machine intelligence, 41(8):1797\u2013 1812 , 2018. Henning Tjaden, Ulrich Schwanecke, Elmar Sch\u00f6mer, and Daniel Cremers. A region-based gauss-newton approach to real-time monocular multiple object tracking. IEEE transactions on pattern analysis and machine intelligence, 41(8):1797\u20131812, 2018."},{"key":"e_1_3_2_1_26_1","volume-title":"Data-driven 6d pose tracking by calibrating image residuals in synthetic domains. arXiv preprint arXiv:2105.14391","author":"Wen Bowen","year":"2021","unstructured":"Bowen Wen , Chaitanya Mitash , and Kostas Bekris . Data-driven 6d pose tracking by calibrating image residuals in synthetic domains. arXiv preprint arXiv:2105.14391 , 2021 . Bowen Wen, Chaitanya Mitash, and Kostas Bekris. Data-driven 6d pose tracking by calibrating image residuals in synthetic domains. arXiv preprint arXiv:2105.14391, 2021."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298930"},{"key":"e_1_3_2_1_28_1","volume-title":"Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199","author":"Xiang Yu","year":"2017","unstructured":"Yu Xiang , Tanner Schmidt , Venkatraman Narayanan , and Dieter Fox . Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 , 2017 . Yu Xiang, Tanner Schmidt, Venkatraman Narayanan, and Dieter Fox. Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199, 2017."},{"key":"e_1_3_2_1_29_1","volume-title":"Pose flow: Efficient online pose tracking. arXiv preprint arXiv:1802.00977","author":"Xiu Yuliang","year":"2018","unstructured":"Yuliang Xiu , Jiefeng Li , Haoyu Wang , Yinghong Fang , and Cewu Lu . Pose flow: Efficient online pose tracking. arXiv preprint arXiv:1802.00977 , 2018 . Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, and Cewu Lu. Pose flow: Efficient online pose tracking. arXiv preprint arXiv:1802.00977, 2018."},{"key":"e_1_3_2_1_30_1","volume-title":"Jiashi Feng, and Shuicheng Yan. Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986","author":"Yuan Li","year":"2021","unstructured":"Li Yuan , Yunpeng Chen , Tao Wang , Weihao Yu , Yujun Shi , Zihang Jiang , Francis EH Tay , Jiashi Feng, and Shuicheng Yan. Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986 , 2021 . Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986, 2021."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00203"},{"key":"e_1_3_2_1_32_1","first-page":"669","volume-title":"European Conference on Computer Vision","author":"Zhang Jirong","unstructured":"Jirong Zhang , Chuan Wang , Shuaicheng Liu , Lanpeng Jia , Nianjin Ye , Jue Wang , Ji Zhou , and Jian Sun . Content-aware unsupervised deep homography estimation . In European Conference on Computer Vision , pages 653\u2013 669 . Springer, 2020 Jirong Zhang, Chuan Wang, Shuaicheng Liu, Lanpeng Jia, Nianjin Ye, Jue Wang, Ji Zhou, and Jian Sun. Content-aware unsupervised deep homography estimation. In European Conference on Computer Vision, pages 653\u2013669. Springer, 2020"},{"key":"e_1_3_2_1_33_1","volume-title":"Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159","author":"Zhu Xizhou","year":"2020","unstructured":"Xizhou Zhu , Weijie Su , Lewei Lu , Bin Li , Xiaogang Wang , and Jifeng Dai . Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 , 2020 . Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.18178\/joig.9.2.50-54"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.18178\/joig.6.2.174-180"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.18178\/joig.6.1.21-26"}],"event":{"name":"ICIGP 2022: 2022 the 5th International Conference on Image and Graphics Processing","acronym":"ICIGP 2022","location":"Beijing China"},"container-title":["2022 the 5th International Conference on Image and Graphics Processing (ICIGP)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3512388.3512432","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3512388.3512432","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:43Z","timestamp":1750191103000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3512388.3512432"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,7]]},"references-count":36,"alternative-id":["10.1145\/3512388.3512432","10.1145\/3512388"],"URL":"https:\/\/doi.org\/10.1145\/3512388.3512432","relation":{},"subject":[],"published":{"date-parts":[[2022,1,7]]},"assertion":[{"value":"2022-03-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}