{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T05:43:11Z","timestamp":1771911791300,"version":"3.50.1"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"8","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62441232, 62476068, U21B2038, 62306092, and 62306091"],"award-info":[{"award-number":["62441232, 62476068, U21B2038, 62306092, and 62306091"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Natural Science Foundation of Shandong Province, China","award":["ZR2024QF066 and ZR2023QF052"],"award-info":[{"award-number":["ZR2024QF066 and ZR2023QF052"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:p>\n            Accurately detecting objects in 3D scenes is crucial for autonomous driving. Although existing voxel-based methods have achieved remarkable progress, their performance on tail objects remains unsatisfactory. We identify two core issues contributing to this phenomenon: the detectors frequently misidentify some background elements as foreground objects, and there is a misalignment between the classification score and detection quality. To tackle these challenges, we introduce an object-level\n            <jats:bold>\u2013<\/jats:bold>\n            guided multi-modal 3D object detector with an object-guided feature fusion (OFF) module and a hierarchical sample selection (HSS) strategy, named OGMMDet. Specifically, OFF introduces rich image features to enhance the representation of objects while using an object distribution heatmap to suppress the background. This approach provides geometry clues for tail objects while providing category priors to filter out the background. HSS uses a local-to-global ranking approach to calculate the relative classification loss weights of all proposals. It assigns higher weights to proposals with higher IoU when optimizing classification branches. This ensures that the model focuses its optimization on these higher-quality proposals. Consequently, there is a positive correlation between the classification score and IoU. This method alleviates the misalignment between the classification score and detection quality. Extensive experiments on the KITTI and nuScenes benchmarks demonstrate the effectiveness of our OGMMDet, which achieves 45.61% and 68.96% mean average precision (mAP) on pedestrians and cyclists on the KITTI benchmark, respectively. Code is available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/ZhongJianPing1\/ogmmdet.git\">https:\/\/github.com\/ZhongJianPing1\/ogmmdet.git<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3744247","type":"journal-article","created":{"date-parts":[[2025,6,13]],"date-time":"2025-06-13T12:05:33Z","timestamp":1749816333000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Multi-Modal 3D Object Detector with Object-Guided Fusion and Hierarchical Sample Selection"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-1906-734X","authenticated-orcid":false,"given":"Jianping","family":"Zhong","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, Weihai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9196-9818","authenticated-orcid":false,"given":"Zhaobo","family":"Qi","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Weihai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8663-7429","authenticated-orcid":false,"given":"Kaiwen","family":"Duan","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9457-7956","authenticated-orcid":false,"given":"Yuanrong","family":"Xu","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Weihai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0042-7074","authenticated-orcid":false,"given":"Weigang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Weihai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7542-296X","authenticated-orcid":false,"given":"Qingming","family":"Huang","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,8,12]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"11621","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Caesar Holger","year":"2020","unstructured":"Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. Nuscenes: A multimodal dataset for autonomous driving. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 11621\u201311631."},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"5706","DOI":"10.1109\/TIP.2022.3201469","article-title":"3D cascade RCNN: High quality object detection in point clouds","volume":"31","author":"Cai Qi","year":"2022","unstructured":"Qi Cai, Yingwei Pan, Ting Yao, and Tao Mei. 2022. 3D cascade RCNN: High quality object detection in point clouds. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society 31 (2022), 5706\u20135719.","journal-title":"IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society"},{"key":"e_1_3_1_4_2","first-page":"18067","volume-title":"Proceedings of IEEE\/CVF International Conference on Computer Vision","author":"Cai Qi","year":"2023","unstructured":"Qi Cai, Yingwei Pan, Ting Yao, Chong-Wah Ngo, and Tao Mei. 2023. Objectfusion: Multi-modal 3D object detection with object-centric fusion. In Proceedings of IEEE\/CVF International Conference on Computer Vision, 18067\u201318076."},{"key":"e_1_3_1_5_2","first-page":"221","volume-title":"Proceedings of AAAI Conference on Artificial Intelligence","volume":"36","author":"Chen Chen","year":"2022","unstructured":"Chen Chen, Zhe Chen, Jing Zhang, and Dacheng Tao. 2022. SASA: Semantics-augmented set abstraction for point-based 3D object detection. In Proceedings of AAAI Conference on Artificial Intelligence, Vol. 36, 221\u2013229."},{"key":"e_1_3_1_6_2","first-page":"68","volume-title":"Proceedings of European Conference on Computer Vision","author":"Chen Qi","year":"2020","unstructured":"Qi Chen, Lin Sun, Zhixin Wang, Kui Jia, and Alan Yuille. 2020. Object as hotspots: An anchor-free 3D object detection approach via firing of hotspots. In Proceedings of European Conference on Computer Vision, 68\u201384."},{"key":"e_1_3_1_7_2","unstructured":"Shoufa Chen Peize Sun Yibing Song and Ping Luo. 2022. Diffusiondet: Diffusion model for object detection. arXiv:2211.09788. Retrieved from https:\/\/arxiv.org\/abs\/2211.09788"},{"key":"e_1_3_1_8_2","first-page":"1907","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Chen Xiaozhi","year":"2017","unstructured":"Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3D object detection network for autonomous driving. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 1907\u20131915."},{"key":"e_1_3_1_9_2","first-page":"5428","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Chen Yukang","year":"2022","unstructured":"Yukang Chen, Yanwei Li, Xiangyu Zhang, Jian Sun, and Jiaya Jia.2022. Focal sparse convolutional networks for 3D object detection. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 5428\u20135437."},{"key":"e_1_3_1_10_2","first-page":"1201","volume-title":"Proceedings of AAAI Conference on Artificial Intelligence","volume":"35","author":"Deng Jiajun","year":"2021","unstructured":"Jiajun Deng, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. 2021. Voxel R-CNN: Towards high performance voxel-based 3D object detection. In Proceedings of AAAI Conference on Artificial Intelligence, Vol. 35, 1201\u20131209."},{"key":"e_1_3_1_11_2","first-page":"3354","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Geiger Andreas","year":"2012","unstructured":"Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 3354\u20133361."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3005434"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_14_2","first-page":"8469","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Hu Jordan S. K.","year":"2022","unstructured":"Jordan S. K. Hu, Tianshu Kuai, and Steven L. Waslander. 2022. Point density-aware voxels for lidar 3D object detection. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 8469\u20138478."},{"key":"e_1_3_1_15_2","unstructured":"Junjie Huang and Guan Huang. 2022. BeVDet4D: Exploit temporal cues in multi-camera 3D object detection. arXiv:2203.17054. Retrieved from https:\/\/arxiv.org\/abs\/2203.17054"},{"key":"e_1_3_1_16_2","first-page":"35","volume-title":"Proceedings of European Conference on Computer Vision","author":"Huang Tengteng","year":"2020","unstructured":"Tengteng Huang, Zhe Liu, Xiwu Chen, and Xiang Bai. 2020. EPNet: Enhancing point features with image semantics for 3D object detection. In Proceedings of European Conference on Computer Vision, 35\u201352."},{"key":"e_1_3_1_17_2","first-page":"13408","volume-title":"Proceedings of IEEE International Conference on Robotics and Automation","author":"Jiang Tianyuan","year":"2021","unstructured":"Tianyuan Jiang, Nan Song, Huanyu Liu, Ruihao Yin, Ye Gong, and Jian Yao. 2021. VIC-Net: Voxelization information compensation network for point cloud 3D object detection. In Proceedings of IEEE International Conference on Robotics and Automation, 13408\u201313414."},{"key":"e_1_3_1_18_2","first-page":"1","volume-title":"Proceedings of IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Ku Jason","year":"2018","unstructured":"Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L. Waslander. 2018. Joint 3D proposal generation and object detection from view aggregation. In Proceedings of IEEE\/RSJ International Conference on Intelligent Robots and Systems, 1\u20138."},{"key":"e_1_3_1_19_2","first-page":"12697","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Lang Alex H.","year":"2019","unstructured":"Alex H. Lang, Sourabh Vora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. 2019. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 12697\u201312705."},{"key":"e_1_3_1_20_2","first-page":"5613","volume-title":"Proceedings of AAAI Conference on Artificial Intelligence","author":"Liu Xianzhu","year":"2025","unstructured":"Xianzhu Liu, Xin Sun, Haozhe Xie, Zonglin Li, Ru Li, and Shengping Zhang. 2025. Multi-view consistent 3D panoptic scene understanding. In Proceedings of AAAI Conference on Artificial Intelligence, 5613\u20135621."},{"issue":"3","key":"e_1_3_1_21_2","first-page":"1306","article-title":"2D Semantic-guided semantic scene completion","volume":"33","author":"Liu Xianzhu","year":"2024","unstructured":"Xianzhu Liu, Haozhe Xie, Shengping Zhang, Hongxun Yao, Rongrong Ji, Liqiang Nie, and Dacheng Tao. 2024. 2D Semantic-guided semantic scene completion. International Journal of Computer Vision 33, 3 (2024), 1306\u20131325.","journal-title":"International Journal of Computer Vision"},{"key":"e_1_3_1_22_2","first-page":"12009","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Liu Ze","year":"2022","unstructured":"Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, et al. 2022. Swin Transformer v2: Scaling up capacity and resolution. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 12009\u201312019."},{"key":"e_1_3_1_23_2","first-page":"2774","volume-title":"Proceedings of IEEE International Conference on Robotics and Automation","author":"Liu Zhijian","year":"2023","unstructured":"Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela L. Rus, and Song Han. 2023. Bevfusion: Multi-task multi-sensor fusion with unified bird\u2019s-eye view representation. In Proceedings of IEEE International Conference on Robotics and Automation, 2774\u20132781."},{"key":"e_1_3_1_24_2","first-page":"2723","volume-title":"Proceedings of IEEE\/CVF International Conference on Computer Vision","author":"Mao Jiageng","year":"2021","unstructured":"Jiageng Mao, Minzhe Niu, Haoyue Bai, Xiaodan Liang, Hang Xu, and Chunjing Xu. 2021. Pyramid R-CNN: Towards better performance and adaptability for 3D object detection. In Proceedings of IEEE\/CVF International Conference on Computer Vision, 2723\u20132732."},{"key":"e_1_3_1_25_2","first-page":"3164","volume-title":"Proceedings of IEEE\/CVF International Conference on Computer Vision","author":"Mao Jiageng","year":"2021","unstructured":"Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, and Chunjing Xu. 2021. Voxel transformer for 3D object detection. In Proceedings of IEEE\/CVF International Conference on Computer Vision, 3164\u20133173."},{"key":"e_1_3_1_26_2","first-page":"14605","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Noh Jongyoun","year":"2021","unstructured":"Jongyoun Noh, Sanghoon Lee, and Bumsub Ham. 2021. HVPR: Hybrid voxel-point representation for single-stage 3D object detection. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 14605\u201314614."},{"key":"e_1_3_1_27_2","first-page":"7463","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Pan Xuran","year":"2021","unstructured":"Xuran Pan, Zhuofan Xia, Shiji Song, Li Erran Li, and Gao Huang. 2021. 3D object detection with pointformer. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 7463\u20137472."},{"key":"e_1_3_1_28_2","first-page":"10386","volume-title":"Proceedings of IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Pang Su","year":"2020","unstructured":"Su Pang, Daniel Morris, and Hayder Radha. 2020. CLOCs: Camera-LiDAR object candidates fusion for 3D object detection. In Proceedings of IEEE\/RSJ International Conference on Intelligent Robots and Systems, 10386\u201310393."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3289858"},{"key":"e_1_3_1_30_2","first-page":"918","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Charles R.","year":"2018","unstructured":"Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. 2018. Frustum PointNets for 3D object detection from RGB-D data. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 918\u2013927."},{"key":"e_1_3_1_31_2","first-page":"652","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Charles R.","year":"2017","unstructured":"Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 652\u2013660."},{"key":"e_1_3_1_32_2","unstructured":"Charles Ruizhongtai Qi Li Yi Hao Su and Leonidas J. Guibas. 2017. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st International Conference on Neural Information Processing Systems 5105\u20135114."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3634683"},{"issue":"2","key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3561824","article-title":"UEFPN: Unified and enhanced feature pyramid networks for small object detection","volume":"19","author":"Qiao Ziteng","year":"2023","unstructured":"Ziteng Qiao, Dianxi Shi, Xiaodong Yi, Yanyan Shi, Yuhui Zhang, and Yangyang Liu. 2023. UEFPN: Unified and enhanced feature pyramid networks for small object detection. ACM Transactions on Multimedia Computing, Communications and Applications 19, 2 (2023), 1\u201321.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3637214"},{"key":"e_1_3_1_36_2","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 29th International Conference on Neural Information Processing Systems Vol. 1 91\u201399."},{"key":"e_1_3_1_37_2","first-page":"10529","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Shi Shaoshuai","year":"2020","unstructured":"Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, and Hongsheng Li. 2020. PV-RCNN: Point-voxel feature set abstraction for 3D object detection. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 10529\u201310538."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-022-01710-9"},{"key":"e_1_3_1_39_2","first-page":"770","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Shi Shaoshuai","year":"2019","unstructured":"Shaoshuai Shi, Xiaogang Wang, and Hongsheng Li. 2019. PointRCNN: 3D object proposal generation and detection from point cloud. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 770\u2013779."},{"key":"e_1_3_1_40_2","unstructured":"Shaoshuai Shi Zhe Wang Xiaogang Wang and Hongsheng Li. 2019. Part-a\u02c62 net: 3D part-aware and aggregation neural network for object detection from point cloud. arXiv:1907.03670. Retrieved from https:\/\/arxiv.org\/abs\/1907.03670"},{"key":"e_1_3_1_41_2","first-page":"2271","volume-title":"Proceedings of AAAI Conference on Artificial Intelligence","volume":"36","author":"Song Nan","year":"2022","unstructured":"Nan Song, Tianyuan Jiang, and Jian Yao. 2022. JPV-Net: Joint point-voxel representations for accurate 3D object detection. In Proceedings of AAAI Conference on Artificial Intelligence, Vol. 36, 2271\u20132279."},{"issue":"4","key":"e_1_3_1_42_2","doi-asserted-by":"crossref","first-page":"2619","DOI":"10.1109\/TCSVT.2023.3306361","article-title":"GraphAlign++: An accurate feature alignment by graph matching for multi-modal 3D object detection","volume":"34","author":"Song Ziying","year":"2024","unstructured":"Ziying Song, Caiyan Jia, Lei Yang, Haiyue Wei, and Lin Liu. 2024. GraphAlign++: An accurate feature alignment by graph matching for multi-modal 3D object detection. IEEE Transactions on Circuits and Systems for Video Technology 34, 4 (2024), 2619\u20132632.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_1_43_2","first-page":"9627","volume-title":"Proceedings of IEEE\/CVF International Conference on Computer Vision","author":"Tian Zhi","year":"2019","unstructured":"Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully convolutional one-stage object detection. In Proceedings of IEEE\/CVF International Conference on Computer Vision, 9627\u20139636."},{"key":"e_1_3_1_44_2","first-page":"3631","volume-title":"Proceedings of IEEE\/CVF International Conference on Computer Vision","author":"Wang Keyang","year":"2021","unstructured":"Keyang Wang and Lei Zhang. 2021. Reconcile prediction consistency for balanced object detection. In Proceedings of IEEE\/CVF International Conference on Computer Vision, 3631\u20133640."},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3462219"},{"key":"e_1_3_1_46_2","first-page":"16133","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Woo Sanghyun","year":"2023","unstructured":"Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. 2023. ConvNeXt v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 16133\u201316142."},{"key":"e_1_3_1_47_2","first-page":"21653","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wu Hai","year":"2023","unstructured":"Hai Wu, Chenglu Wen, Shaoshuai Shi, Xin Li, and Cheng Wang. 2023. Virtual sparse convolution for multimodal 3D object detection. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 21653\u201321662."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3248656"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3584362"},{"key":"e_1_3_1_50_2","first-page":"244","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Xu Danfei","year":"2018","unstructured":"Danfei Xu, Dragomir Anguelov, and Ashesh Jain. 2018. Pointfusion: Deep sensor fusion for 3D bounding box estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 244\u2013253."},{"key":"e_1_3_1_51_2","first-page":"2893","volume-title":"Proceedings of AAAI Conference on Artificial Intelligence","volume":"36","author":"Xu Qiangeng","year":"2022","unstructured":"Qiangeng Xu, Yiqi Zhong, and Ulrich Neumann. 2022. Behind the curtain: Learning occluded shapes for 3D object detection. In Proceedings of AAAI Conference on Artificial Intelligence, Vol. 36, 2893\u20132901."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.3390\/s18103337"},{"key":"e_1_3_1_53_2","first-page":"17830","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yang Chenyu","year":"2023","unstructured":"Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, et al.2023. BEVFormer v2: Adapting modern image backbones to bird\u2019s-eye-view recognition via perspective supervision. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 17830\u201317839."},{"issue":"1","key":"e_1_3_1_54_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3539611","article-title":"Exploiting manifold feature representation for efficient classification of 3D point clouds","volume":"19","author":"Yang Dinghao","year":"2023","unstructured":"Dinghao Yang, Wei Gao, Ge Li, Hui Yuan, Junhui Hou, and Sam Kwong. 2023. Exploiting manifold feature representation for efficient classification of 3D point clouds. ACM Transactions on Multimedia Computing, Communications and Applications 19, 1 (2023), 1\u201321.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_55_2","first-page":"11040","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yang Zetong","year":"2020","unstructured":"Zetong Yang, Yanan Sun, Shu Liu, and Jiaya Jia. 2020. 3DSSD: Point-based 3D single stage object detector. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 11040\u201311048."},{"key":"e_1_3_1_56_2","first-page":"1951 1960","volume-title":"Proceedings of IEEE\/CVF International Conference on Computer Vision","author":"Yang Zetong","year":"2019","unstructured":"Zetong Yang, Yanan Sun, Shu Liu, Xiaoyong Shen, and Jiaya Jia. 2019. Std: Sparse-to-dense 3d object detector for point cloud. In Proceedings of IEEE\/CVF International Conference on Computer Vision, 1951\u20131960."},{"key":"e_1_3_1_57_2","first-page":"11784","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yin Tianwei","year":"2021","unstructured":"Tianwei Yin, Xingyi Zhou, and Philipp Krahenbuhl. 2021. Center-based 3D object detection and tracking. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 11784\u201311793."},{"key":"e_1_3_1_58_2","first-page":"720","volume-title":"Proceedings of European Conference on Computer Vision","author":"Yoo Jin Hyeok","year":"2020","unstructured":"Jin Hyeok Yoo, Yecheol Kim, Jisong Kim, and Jun Won Choi. 2020. 3D-CVF: Generating joint camera and lidar features using cross-view spatial feature fusion for 3D object detection. In Proceedings of European Conference on Computer Vision, 720\u2013736."},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3296583"},{"key":"e_1_3_1_60_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Zhang Hao","year":"2022","unstructured":"Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel Ni, and Heung-Yeung Shum. 2022. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-023-01820-y"},{"key":"e_1_3_1_62_2","first-page":"18527","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang Youmin","year":"2023","unstructured":"Youmin Zhang, Xianda Guo, Matteo Poggi, Zheng Zhu, Guan Huang, and Stefano Mattoccia. 2023. Completionformer: Depth completion with convolutions and vision transformers. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 18527\u201318536."},{"key":"e_1_3_1_63_2","first-page":"18953","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang Yifan","year":"2022","unstructured":"Yifan Zhang, Qingyong Hu, Guoquan Xu, Yanxin Ma, Jianwei Wan, and Yulan Guo. 2022. Not all points are equal: Learning highly efficient point-based detectors for 3D lidar point clouds. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 18953\u201318962."},{"key":"e_1_3_1_64_2","first-page":"3555","volume-title":"Proceedings of AAAI Conference on Artificial Intelligence","volume":"35","author":"Zheng Wu","year":"2021","unstructured":"Wu Zheng, Weiliang Tang, Sijin Chen, Li Jiang, and Chi-Wing Fu. 2021. CIA-SSD: Confident IoU-aware single-stage object detector from point cloud. In Proceedings of AAAI Conference on Artificial Intelligence, Vol. 35, 3555\u20133562."},{"key":"e_1_3_1_65_2","first-page":"14494","volume-title":"Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zheng Wu","year":"2021","unstructured":"Wu Zheng, Weiliang Tang, Li Jiang, and Chi-Wing Fu. 2021. SE-SSD: Self-ensembling single-stage object detector from point cloud. In Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 14494\u201314503."},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3595916.3626385"},{"key":"e_1_3_1_67_2","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1109\/3DV.2019.00019","volume-title":"Proceedings of 2019 International Conference on 3D Vision (3DV)","author":"Zhou Dingfu","year":"2019","unstructured":"Dingfu Zhou, Jin Fang, Xibin Song, Chenye Guan, Junbo Yin, Yuchao Dai, and Ruigang Yang. 2019. IoU loss for 2D\/3D object detection. In Proceedings of 2019 International Conference on 3D Vision (3DV), 85\u201394."},{"key":"e_1_3_1_68_2","first-page":"923","volume-title":"Proceedings of Conference on Robot Learning","author":"Zhou Yin","year":"2020","unstructured":"Yin Zhou, Pei Sun, Yu Zhang, Dragomir Anguelov, Jiyang Gao, Tom Ouyang, James Guo, Jiquan Ngiam, and Vijay Vasudevan. 2020. End-to-end multi-view fusion for 3D object detection in lidar point clouds. In Proceedings of Conference on Robot Learning, 923\u2013932."},{"key":"e_1_3_1_69_2","first-page":"4490","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition","author":"Zhou Yin","year":"2018","unstructured":"Yin Zhou and Oncel Tuzel. 2018. VoxelNet: End-to-end learning for point cloud based 3D object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 4490\u20134499."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3744247","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T20:37:57Z","timestamp":1755031077000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3744247"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,12]]},"references-count":68,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8,31]]}},"alternative-id":["10.1145\/3744247"],"URL":"https:\/\/doi.org\/10.1145\/3744247","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,12]]},"assertion":[{"value":"2024-05-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}