{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T13:04:35Z","timestamp":1763384675088,"version":"3.45.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2022YFB4703405"],"award-info":[{"award-number":["2022YFB4703405"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62476080"],"award-info":[{"award-number":["62476080"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004608","name":"Jiangsu Province Natural Science Foundation","doi-asserted-by":"crossref","award":["BK20231186"],"award-info":[{"award-number":["BK20231186"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Key Laboratory about Maritime Intelligent Network Information Technology of the Ministry of Education","award":["EKLMIC202405"],"award-info":[{"award-number":["EKLMIC202405"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Sen. Netw."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>Significant advancements have been made in neural networks for 3D object detection in autonomous driving. However, these vehicles often encounter small and occluded objects, leading to fewer available features and requirement of high positioning accuracy. Current approaches to 3D vehicle detection frequently overlook this challenge, simply feeding features into existing detection models. This paper introduces an innovative boosting multi-modal fusion for 3D vehicle object detection. Initially, we employ pre-trained 3D and 2D object detection models to generate 3D and 2D bounding boxes. Subsequently, a fusion strategy grounded in the rotation intersection ratio, merges two kinds of bounding boxes. To capture information from small objects, we develop a grouping-splitting residual network enhanced with coordinate attention, facilitating the extraction of more detailed information. Experimental results on KITTI dataset reveal that our method achieves a 90.46% accuracy for hard samples in Bird\u2019s eye view. Compared with the advanced multi-modal 3D object detection performance, such as CLOCs, HMFI and PointPainting, our accuracy in hard samples has improved by 1.1%, 1.84%, and 3.75%.<\/jats:p>","DOI":"10.1145\/3765739","type":"journal-article","created":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T11:07:35Z","timestamp":1758712055000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Boosting Multi-modal Fusion for 3D Vehicle Object Detection"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9114-5699","authenticated-orcid":false,"given":"Xinnan","family":"Fan","sequence":"first","affiliation":[{"name":"College of Information Science and Engineering, Hohai University - Changzhou Campus","place":["Changzhou, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5388-751X","authenticated-orcid":false,"given":"Xinyang","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Hohai University - Changzhou Campus","place":["Changzhou, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4467-7641","authenticated-orcid":false,"given":"Pengfei","family":"Shi","sequence":"additional","affiliation":[{"name":"College of Artificial Intelligence, Hohai University - Changzhou Campus","place":["Changzhou, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-8837-052X","authenticated-orcid":false,"given":"Yuchen","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of information Science and Engineering, Hohai University - Changzhou Campus","place":["Changzhou, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8710-511X","authenticated-orcid":false,"given":"Yuanxue","family":"Xin","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Hohai University - Changzhou Campus","place":["Changzhou, China"]}]}],"member":"320","published-online":{"date-parts":[[2025,11,17]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"3D object detection from images for autonomous driving: A survey","author":"Ma Xinzhu","year":"2023","unstructured":"Xinzhu Ma, Wanli Ouyang, Andrea Simonelli, and Elisa Ricci. 2023. 3D object detection from images for autonomous driving: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2020.3020626"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8794195"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"126587","DOI":"10.1016\/j.neucom.2023.126587","article-title":"Multi-modality 3D object detection in autonomous driving: A review","author":"Tang Yingjuan","year":"2023","unstructured":"Yingjuan Tang, Hongwen He, Yong Wang, Zan Mao, and Haoyu Wang. 2023. Multi-modality 3D object detection in autonomous driving: A review. Neurocomputing (2023), 126587.","journal-title":"Neurocomputing"},{"key":"e_1_3_1_6_2","article-title":"Multi-modal 3D object detection in autonomous driving: A survey and taxonomy","author":"Wang Li","year":"2023","unstructured":"Li Wang, Xinyu Zhang, Ziying Song, Jiangfeng Bi, Guoxin Zhang, Haiyue Wei, Liyao Tang, Lei Yang, Jun Li, Caiyan Jia, and Lijun Zhao. 2023. Multi-modal 3D object detection in autonomous driving: A survey and taxonomy. IEEE Transactions on Intelligent Vehicles (2023).","journal-title":"IEEE Transactions on Intelligent Vehicles"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIV.2023.3240287"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.3390\/s18103337"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00472"},{"key":"e_1_3_1_10_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Jinyu","year":"2023","unstructured":"Jinyu Li, Chenxu Luo, and Xiaodong Yang. 2023. PillarNeXt: Rethinking network designs for 3D object detection in LiDAR point clouds. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 17567\u201317576."},{"key":"e_1_3_1_11_2","first-page":"652","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Charles R","year":"2017","unstructured":"Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652\u2013660."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00086"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00204"},{"key":"e_1_3_1_14_2","first-page":"8469","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Hu Jordan SK","year":"2022","unstructured":"Jordan SK Hu, Tianshu Kuai, and Steven L Waslander. 2022. Point density-aware voxels for lidar 3d object detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8469\u20138478."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2023.3285308"},{"key":"e_1_3_1_16_2","first-page":"34899","article-title":"Fully convolutional one-stage 3d object detection on lidar range images","volume":"35","author":"Tian Zhi","year":"2022","unstructured":"Zhi Tian, Xiangxiang Chu, Xiaoming Wang, Xiaolin Wei, and Chunhua Shen. 2022. Fully convolutional one-stage 3d object detection on lidar range images. Advances in Neural Information Processing Systems 35 (2022), 34899\u201334911.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_17_2","first-page":"139","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Chen Yi-Ting","year":"2022","unstructured":"Yi-Ting Chen, Jinghao Shi, Zelin Ye, Christoph Mertz, Deva Ramanan, and Shu Kong. 2022. Multimodal object detection via probabilistic ensembling. In Proceedings of the European Conference on Computer Vision. Springer, 139\u2013158."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2020.3023541"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.691"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00033"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00466"},{"key":"e_1_3_1_22_2","first-page":"720","volume-title":"Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXVII 16","author":"Yoo Jin Hyeok","year":"2020","unstructured":"Jin Hyeok Yoo, Yecheol Kim, Jisong Kim, and Jun Won Choi. 2020. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXVII 16. Springer, 720\u2013736."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01161"},{"key":"e_1_3_1_24_2","first-page":"444","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Qian Kun","year":"2021","unstructured":"Kun Qian, Shilin Zhu, Xinyu Zhang, and Li Erran Li. 2021. Robust multimodal vehicle detection in foggy weather using complementary lidar and radar signals. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 444\u2013453."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.3390\/app11125598"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2021.3122865"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341791"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2021.10.017"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2020.01.008"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(01)00103-0"},{"key":"e_1_3_1_31_2","first-page":"3093","volume-title":"Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP)","author":"Nabati Ramin","year":"2019","unstructured":"Ramin Nabati and Hairong Qi. 2019. Rrpn: Radar region proposal network for object detection in autonomous vehicles. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 3093\u20133097."},{"key":"e_1_3_1_32_2","first-page":"18381","article-title":"Learning high-precision bounding box for rotated object detection via kullback-leibler divergence","volume":"34","author":"Yang Xue","year":"2021","unstructured":"Xue Yang, Xiaojiang Yang, Jirui Yang, Qi Ming, Wentao Wang, Qi Tian, and Junchi Yan. 2021. Learning high-precision bounding box for rotated object detection via kullback-leibler divergence. Advances in Neural Information Processing Systems 34 (2021), 18381\u201318394.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1007\/978-3-030-58558-7_12","volume-title":"Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part V 16","author":"Chen Zhiming","year":"2020","unstructured":"Zhiming Chen, Kean Chen, Weiyao Lin, John See, Hui Yu, Yan Ke, and Cong Yang. 2020. Piou loss: Towards accurate oriented object detection in complex environments. In Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part V 16. Springer, 195\u2013211."},{"key":"e_1_3_1_34_2","unstructured":"Xue Yang Yue Zhou Gefan Zhang Jirui Yang Wentao Wang Junchi Yan Xiaopeng Zhang and Qi Tian. 2022. The KFIoU loss for rotated object detection. arXiv:2201.12558. Retrieved from https:\/\/arxiv.org\/abs\/2201.12558 (2022)."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354978"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2938758"},{"key":"e_1_3_1_38_2","first-page":"2736","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang Hang","year":"2022","unstructured":"Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R Manmatha, Mu Li, and Alexander Smola. 2022. Resnest: Split-attention networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2736\u20132746."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"e_1_3_1_41_2","first-page":"1970","article-title":"Se (3)-transformers: 3d roto-translation equivariant attention networks","volume":"33","author":"Fuchs Fabian","year":"2020","unstructured":"Fabian Fuchs, Daniel Worrall, Volker Fischer, and Max Welling. 2020. Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in Neural Information Processing Systems 33 (2020), 1970\u20131981.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3179507"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00102"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2018.8594049"},{"key":"e_1_3_1_45_2","first-page":"691","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Li Xin","year":"2022","unstructured":"Xin Li, Botian Shi, Yuenan Hou, Xingjiao Wu, Tianlong Ma, Yikang Li, and Liang He. 2022. Homogeneous multi-modal feature fusion and interaction for 3D object detection. In Proceedings of the European Conference on Computer Vision. Springer, 691\u2013707."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00721"}],"container-title":["ACM Transactions on Sensor Networks"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3765739","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T13:02:21Z","timestamp":1763384541000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3765739"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,17]]},"references-count":45,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3765739"],"URL":"https:\/\/doi.org\/10.1145\/3765739","relation":{},"ISSN":["1550-4859","1550-4867"],"issn-type":[{"type":"print","value":"1550-4859"},{"type":"electronic","value":"1550-4867"}],"subject":[],"published":{"date-parts":[[2025,11,17]]},"assertion":[{"value":"2024-03-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-19","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}