{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:58:21Z","timestamp":1750309101546,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,12,6]],"date-time":"2023-12-06T00:00:00Z","timestamp":1701820800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,12,6]]},"DOI":"10.1145\/3595916.3626373","type":"proceedings-article","created":{"date-parts":[[2024,1,1]],"date-time":"2024-01-01T16:34:41Z","timestamp":1704126881000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["History-Detr: Optimize Query Initialization Strategy by Using Historical Information and Kinematics"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-0366-8538","authenticated-orcid":false,"given":"Weijie","family":"Luo","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, China and Huixi Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6680-5319","authenticated-orcid":false,"given":"Zihao","family":"Liu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China and Huixi Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0849-3252","authenticated-orcid":false,"given":"Guohao","family":"Dai","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6809-7694","authenticated-orcid":false,"given":"Ningyi","family":"Xu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China and Huixi Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,1]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1090\u20131099","author":"Bai Xuyang","year":"2022","unstructured":"Xuyang Bai , Zeyu Hu , Xinge Zhu , Qingqiu Huang , Yilun Chen , Hongbo Fu , and Chiew-Lan Tai . 2022 . Transfusion: Robust lidar-camera fusion for 3d object detection with transformers . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1090\u20131099 . Xuyang Bai, Zeyu Hu, Xinge Zhu, Qingqiu Huang, Yilun Chen, Hongbo Fu, and Chiew-Lan Tai. 2022. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1090\u20131099."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_2_1_4_1","volume-title":"Polar parametrization for vision-based surround-view 3d detection. arXiv preprint arXiv:2206.10965","author":"Chen Shaoyu","year":"2022","unstructured":"Shaoyu Chen , Xinggang Wang , Tianheng Cheng , Qian Zhang , Chang Huang , and Wenyu Liu . 2022. Polar parametrization for vision-based surround-view 3d detection. arXiv preprint arXiv:2206.10965 ( 2022 ). Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Chang Huang, and Wenyu Liu. 2022. Polar parametrization for vision-based surround-view 3d detection. arXiv preprint arXiv:2206.10965 (2022)."},{"key":"e_1_3_2_1_5_1","unstructured":"MMDetection3D Contributors. 2020. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https:\/\/github.com\/open-mmlab\/mmdetection3d.  MMDetection3D Contributors. 2020. MMDetection3D: OpenMMLab next-generation platform for general 3D object detection. https:\/\/github.com\/open-mmlab\/mmdetection3d."},{"key":"e_1_3_2_1_6_1","volume-title":"International journal of computer vision 88","author":"Everingham Mark","year":"2009","unstructured":"Mark Everingham , Luc Van\u00a0Gool , Christopher\u00a0 KI Williams , John Winn , and Andrew Zisserman . 2009. The pascal visual object classes (voc) challenge . International journal of computer vision 88 ( 2009 ), 303\u2013308. Mark Everingham, Luc Van\u00a0Gool, Christopher\u00a0KI Williams, John Winn, and Andrew Zisserman. 2009. The pascal visual object classes (voc) challenge. International journal of computer vision 88 (2009), 303\u2013308."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 15273\u201315282","author":"Hu Anthony","year":"2021","unstructured":"Anthony Hu , Zak Murez , Nikhil Mohan , Sof\u00eda Dudas , Jeffrey Hawke , Vijay Badrinarayanan , Roberto Cipolla , and Alex Kendall . 2021 . FIERY: future instance prediction in bird\u2019s-eye view from surround monocular cameras . In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 15273\u201315282 . Anthony Hu, Zak Murez, Nikhil Mohan, Sof\u00eda Dudas, Jeffrey Hawke, Vijay Badrinarayanan, Roberto Cipolla, and Alex Kendall. 2021. FIERY: future instance prediction in bird\u2019s-eye view from surround monocular cameras. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 15273\u201315282."},{"key":"e_1_3_2_1_9_1","volume-title":"Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054","author":"Huang Junjie","year":"2022","unstructured":"Junjie Huang and Guan Huang . 2022. Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054 ( 2022 ). Junjie Huang and Guan Huang. 2022. Bevdet4d: Exploit temporal cues in multi-camera 3d object detection. arXiv preprint arXiv:2203.17054 (2022)."},{"key":"e_1_3_2_1_10_1","volume-title":"Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790","author":"Huang Junjie","year":"2021","unstructured":"Junjie Huang , Guan Huang , Zheng Zhu , and Dalong Du . 2021 . Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790 (2021). Junjie Huang, Guan Huang, Zheng Zhu, and Dalong Du. 2021. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790 (2021)."},{"key":"e_1_3_2_1_11_1","volume-title":"Polarformer: Multi-camera 3d object detection with polar transformers. arXiv preprint arXiv:2206.15398","author":"Jiang Yanqin","year":"2022","unstructured":"Yanqin Jiang , Li Zhang , Zhenwei Miao , Xiatian Zhu , Jin Gao , Weiming Hu , and Yu-Gang Jiang . 2022 . Polarformer: Multi-camera 3d object detection with polar transformers. arXiv preprint arXiv:2206.15398 (2022). Yanqin Jiang, Li Zhang, Zhenwei Miao, Xiatian Zhu, Jin Gao, Weiming Hu, and Yu-Gang Jiang. 2022. Polarformer: Multi-camera 3d object detection with polar transformers. arXiv preprint arXiv:2206.15398 (2022)."},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13619\u201313627","author":"Li Feng","year":"2022","unstructured":"Feng Li , Hao Zhang , Shilong Liu , Jian Guo , Lionel\u00a0 M Ni , and Lei Zhang . 2022 . Dn-detr: Accelerate detr training by introducing query denoising . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13619\u201313627 . Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel\u00a0M Ni, and Lei Zhang. 2022. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13619\u201313627."},{"key":"e_1_3_2_1_13_1","volume-title":"Bevstereo: Enhancing depth estimation in multi-view 3d object detection with dynamic temporal stereo. arXiv preprint arXiv:2209.10248","author":"Li Yinhao","year":"2022","unstructured":"Yinhao Li , Han Bao , Zheng Ge , Jinrong Yang , Jianjian Sun , and Zeming Li . 2022 . Bevstereo: Enhancing depth estimation in multi-view 3d object detection with dynamic temporal stereo. arXiv preprint arXiv:2209.10248 (2022). Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, and Zeming Li. 2022. Bevstereo: Enhancing depth estimation in multi-view 3d object detection with dynamic temporal stereo. arXiv preprint arXiv:2209.10248 (2022)."},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a037","author":"Li Yinhao","year":"2023","unstructured":"Yinhao Li , Han Bao , Zheng Ge , Jinrong Yang , Jianjian Sun , and Zeming Li . 2023 . BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo . In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a037 . 1486\u20131494. Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, and Zeming Li. 2023. BEVStereo: Enhancing Depth Estimation in Multi-View 3D Object Detection with Temporal Stereo. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a037. 1486\u20131494."},{"key":"e_1_3_2_1_15_1","volume-title":"Unifying Voxel-based Representation with Transformer for 3D Object Detection. arXiv preprint arXiv:2206.00630","author":"Li Yanwei","year":"2022","unstructured":"Yanwei Li , Yilun Chen , Xiaojuan Qi , Zeming Li , Jian Sun , and Jiaya Jia . 2022. Unifying Voxel-based Representation with Transformer for 3D Object Detection. arXiv preprint arXiv:2206.00630 ( 2022 ). Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, and Jiaya Jia. 2022. Unifying Voxel-based Representation with Transformer for 3D Object Detection. arXiv preprint arXiv:2206.00630 (2022)."},{"key":"e_1_3_2_1_16_1","volume-title":"Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. arXiv preprint arXiv:2206.10092","author":"Li Yinhao","year":"2022","unstructured":"Yinhao Li , Zheng Ge , Guanyi Yu , Jinrong Yang , Zengran Wang , Yukang Shi , Jianjian Sun , and Zeming Li . 2022 . Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. arXiv preprint arXiv:2206.10092 (2022). Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. 2022. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. arXiv preprint arXiv:2206.10092 (2022)."},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a037","author":"Li Yinhao","year":"2023","unstructured":"Yinhao Li , Zheng Ge , Guanyi Yu , Jinrong Yang , Zengran Wang , Yukang Shi , Jianjian Sun , and Zeming Li . 2023 . Bevdepth: Acquisition of reliable depth for multi-view 3d object detection . In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a037 . 1477\u20131485. Yinhao Li, Zheng Ge, Guanyi Yu, Jinrong Yang, Zengran Wang, Yukang Shi, Jianjian Sun, and Zeming Li. 2023. Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a037. 1477\u20131485."},{"key":"e_1_3_2_1_18_1","volume-title":"BEVFormer: Learning Bird\u2019s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers. arXiv preprint arXiv:2203.17270","author":"Li Zhiqi","year":"2022","unstructured":"Zhiqi Li , Wenhai Wang , Hongyang Li , Enze Xie , Chonghao Sima , Tong Lu , Qiao Yu , and Jifeng Dai . 2022. BEVFormer: Learning Bird\u2019s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers. arXiv preprint arXiv:2203.17270 ( 2022 ). Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, and Jifeng Dai. 2022. BEVFormer: Learning Bird\u2019s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers. arXiv preprint arXiv:2203.17270 (2022)."},{"key":"e_1_3_2_1_19_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=oMI9PjOb9Jl","author":"Liu Shilong","year":"2022","unstructured":"Shilong Liu , Feng Li , Hao Zhang , Xiao Yang , Xianbiao Qi , Hang Su , Jun Zhu , and Lei Zhang . 2022 . DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR . In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=oMI9PjOb9Jl Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. 2022. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=oMI9PjOb9Jl"},{"key":"e_1_3_2_1_20_1","volume-title":"DAB-DETR: Dynamic anchor boxes are better queries for DETR. arXiv preprint arXiv:2201.12329","author":"Liu Shilong","year":"2022","unstructured":"Shilong Liu , Feng Li , Hao Zhang , Xiao Yang , Xianbiao Qi , Hang Su , Jun Zhu , and Lei Zhang . 2022. DAB-DETR: Dynamic anchor boxes are better queries for DETR. arXiv preprint arXiv:2201.12329 ( 2022 ). Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. 2022. DAB-DETR: Dynamic anchor boxes are better queries for DETR. arXiv preprint arXiv:2201.12329 (2022)."},{"key":"e_1_3_2_1_21_1","volume-title":"Petr: Position embedding transformation for multi-view 3d object detection. In Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23\u201327","author":"Liu Yingfei","year":"2022","unstructured":"Yingfei Liu , Tiancai Wang , Xiangyu Zhang , and Jian Sun . 2022 . Petr: Position embedding transformation for multi-view 3d object detection. In Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23\u201327 , 2022, Proceedings, Part XXVII. Springer , 531\u2013548. Yingfei Liu, Tiancai Wang, Xiangyu Zhang, and Jian Sun. 2022. Petr: Position embedding transformation for multi-view 3d object detection. In Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23\u201327, 2022, Proceedings, Part XXVII. Springer, 531\u2013548."},{"key":"e_1_3_2_1_22_1","volume-title":"PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images. arXiv preprint arXiv:2206.01256","author":"Liu Yingfei","year":"2022","unstructured":"Yingfei Liu , Junjie Yan , Fan Jia , Shuailin Li , Qi Gao , Tiancai Wang , Xiangyu Zhang , and Jian Sun . 2022. PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images. arXiv preprint arXiv:2206.01256 ( 2022 ). Yingfei Liu, Junjie Yan, Fan Jia, Shuailin Li, Qi Gao, Tiancai Wang, Xiangyu Zhang, and Jian Sun. 2022. PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images. arXiv preprint arXiv:2206.01256 (2022)."},{"key":"e_1_3_2_1_23_1","volume-title":"DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention. arXiv preprint arXiv:2212.07849","author":"Luo Zhipeng","year":"2022","unstructured":"Zhipeng Luo , Changqing Zhou , Gongjie Zhang , and Shijian Lu. 2022. DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention. arXiv preprint arXiv:2212.07849 ( 2022 ). Zhipeng Luo, Changqing Zhou, Gongjie Zhang, and Shijian Lu. 2022. DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention. arXiv preprint arXiv:2212.07849 (2022)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00864"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00363"},{"key":"e_1_3_2_1_26_1","volume-title":"Time will tell: New outlooks and a baseline for temporal multi-view 3d object detection. arXiv preprint arXiv:2210.02443","author":"Park Jinhyung","year":"2022","unstructured":"Jinhyung Park , Chenfeng Xu , Shijia Yang , Kurt Keutzer , Kris Kitani , Masayoshi Tomizuka , and Wei Zhan . 2022. Time will tell: New outlooks and a baseline for temporal multi-view 3d object detection. arXiv preprint arXiv:2210.02443 ( 2022 ). Jinhyung Park, Chenfeng Xu, Shijia Yang, Kurt Keutzer, Kris Kitani, Masayoshi Tomizuka, and Wei Zhan. 2022. Time will tell: New outlooks and a baseline for temporal multi-view 3d object detection. arXiv preprint arXiv:2210.02443 (2022)."},{"key":"e_1_3_2_1_27_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).  Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017)."},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings, Part XIV 16","author":"Philion Jonah","year":"2020","unstructured":"Jonah Philion and Sanja Fidler . 2020 . Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020 , Proceedings, Part XIV 16 . Springer, 194\u2013210. Jonah Philion and Sanja Fidler. 2020. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XIV 16. Springer, 194\u2013210."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00845"},{"key":"e_1_3_2_1_30_1","volume-title":"Conference on Robot Learning. PMLR, 1475\u20131485","author":"Wang Tai","year":"2022","unstructured":"Tai Wang , ZHU Xinge , Jiangmiao Pang , and Dahua Lin . 2022 . Probabilistic and geometric depth: Detecting objects in perspective . In Conference on Robot Learning. PMLR, 1475\u20131485 . Tai Wang, ZHU Xinge, Jiangmiao Pang, and Dahua Lin. 2022. Probabilistic and geometric depth: Detecting objects in perspective. In Conference on Robot Learning. PMLR, 1475\u20131485."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW54120.2021.00107"},{"key":"e_1_3_2_1_32_1","volume-title":"Conference on Robot Learning. PMLR, 180\u2013191","author":"Wang Yue","year":"2022","unstructured":"Yue Wang , Vitor\u00a0Campagnolo Guizilini , Tianyuan Zhang , Yilun Wang , Hang Zhao , and Justin Solomon . 2022 . Detr3d: 3d object detection from multi-view images via 3d-to-2d queries . In Conference on Robot Learning. PMLR, 180\u2013191 . Yue Wang, Vitor\u00a0Campagnolo Guizilini, Tianyuan Zhang, Yilun Wang, Hang Zhao, and Justin Solomon. 2022. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning. PMLR, 180\u2013191."},{"key":"e_1_3_2_1_33_1","volume-title":"Proceedings of the AAAI conference on artificial intelligence, Vol.\u00a036","author":"Wang Yingming","year":"2022","unstructured":"Yingming Wang , Xiangyu Zhang , Tong Yang , and Jian Sun . 2022 . Anchor detr: Query design for transformer-based detector . In Proceedings of the AAAI conference on artificial intelligence, Vol.\u00a036 . 2567\u20132575. Yingming Wang, Xiangyu Zhang, Tong Yang, and Jian Sun. 2022. Anchor detr: Query design for transformer-based detector. In Proceedings of the AAAI conference on artificial intelligence, Vol.\u00a036. 2567\u20132575."},{"key":"e_1_3_2_1_34_1","volume-title":"Sts: Surround-view temporal stereo for multi-view 3d detection. arXiv preprint arXiv:2208.10145","author":"Wang Zengran","year":"2022","unstructured":"Zengran Wang , Chen Min , Zheng Ge , Yinhao Li , Zeming Li , Hongyu Yang , and Di Huang . 2022 . Sts: Surround-view temporal stereo for multi-view 3d detection. arXiv preprint arXiv:2208.10145 (2022). Zengran Wang, Chen Min, Zheng Ge, Yinhao Li, Zeming Li, Hongyu Yang, and Di Huang. 2022. Sts: Surround-view temporal stereo for multi-view 3d detection. arXiv preprint arXiv:2208.10145 (2022)."},{"key":"e_1_3_2_1_35_1","volume-title":"BEVFormer v2: Adapting Modern Image Backbones to Bird\u2019s-Eye-View Recognition via Perspective Supervision. arXiv preprint arXiv:2211.10439","author":"Yang Chenyu","year":"2022","unstructured":"Chenyu Yang , Yuntao Chen , Hao Tian , Chenxin Tao , Xizhou Zhu , Zhaoxiang Zhang , Gao Huang , Hongyang Li , Yu Qiao , Lewei Lu , 2022. BEVFormer v2: Adapting Modern Image Backbones to Bird\u2019s-Eye-View Recognition via Perspective Supervision. arXiv preprint arXiv:2211.10439 ( 2022 ). Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, 2022. BEVFormer v2: Adapting Modern Image Backbones to Bird\u2019s-Eye-View Recognition via Perspective Supervision. arXiv preprint arXiv:2211.10439 (2022)."},{"key":"e_1_3_2_1_36_1","unstructured":"Zeyu Yang Jiaqi Chen Zhenwei Miao Wei Li Xiatian Zhu and Li Zhang. 2022. DeepInteraction: 3D Object Detection via Modality Interaction. In NeurIPS.  Zeyu Yang Jiaqi Chen Zhenwei Miao Wei Li Xiatian Zhu and Li Zhang. 2022. DeepInteraction: 3D Object Detection via Modality Interaction. In NeurIPS."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01237-3_47"},{"key":"e_1_3_2_1_38_1","volume-title":"Efficient detr: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318","author":"Yao Zhuyu","year":"2021","unstructured":"Zhuyu Yao , Jiangbo Ai , Boxun Li , and Chi Zhang . 2021. Efficient detr: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318 ( 2021 ). Zhuyu Yao, Jiangbo Ai, Boxun Li, and Chi Zhang. 2021. Efficient detr: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318 (2021)."},{"key":"e_1_3_2_1_39_1","volume-title":"Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605","author":"Zhang Hao","year":"2022","unstructured":"Hao Zhang , Feng Li , Shilong Liu , Lei Zhang , Hang Su , Jun Zhu , Lionel\u00a0 M Ni , and Heung-Yeung Shum . 2022 . Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022). Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel\u00a0M Ni, and Heung-Yeung Shum. 2022. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022)."},{"key":"e_1_3_2_1_40_1","volume-title":"Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159","author":"Zhu Xizhou","year":"2020","unstructured":"Xizhou Zhu , Weijie Su , Lewei Lu , Bin Li , Xiaogang Wang , and Jifeng Dai . 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 ( 2020 ). Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)."}],"event":{"name":"MMAsia '23: ACM Multimedia Asia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Tainan Taiwan","acronym":"MMAsia '23"},"container-title":["ACM Multimedia Asia 2023"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3595916.3626373","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3595916.3626373","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:48:40Z","timestamp":1750286920000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3595916.3626373"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,6]]},"references-count":40,"alternative-id":["10.1145\/3595916.3626373","10.1145\/3595916"],"URL":"https:\/\/doi.org\/10.1145\/3595916.3626373","relation":{},"subject":[],"published":{"date-parts":[[2023,12,6]]},"assertion":[{"value":"2024-01-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}