{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:39:13Z","timestamp":1760146753097,"version":"build-2065373602"},"reference-count":52,"publisher":"MDPI AG","issue":"23","license":[{"start":{"date-parts":[[2024,12,6]],"date-time":"2024-12-06T00:00:00Z","timestamp":1733443200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100011789","name":"International Cooperation Foundation of Jilin Province","doi-asserted-by":"publisher","award":["20210402074GH","2023SYF05","CXTD2023002"],"award-info":[{"award-number":["20210402074GH","2023SYF05","CXTD2023002"]}],"id":[{"id":"10.13039\/501100011789","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Public Welfare Science and Technology Foundation of Zhongshan City","award":["20210402074GH","2023SYF05","CXTD2023002"],"award-info":[{"award-number":["20210402074GH","2023SYF05","CXTD2023002"]}]},{"name":"Innovative Research Team Funding","award":["20210402074GH","2023SYF05","CXTD2023002"],"award-info":[{"award-number":["20210402074GH","2023SYF05","CXTD2023002"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Three-dimensional object detection has been a key area of research in recent years because of its rich spatial information and superior performance in addressing occlusion issues. However, the performance of 3D object detection still lags significantly behind that of 2D object detection, owing to challenges such as difficulties in feature extraction and a lack of texture information. To address this issue, this study proposes a 3D object detection network, CaLiJD (Camera and Lidar Joint Contender for 3D object Detection), guided by two-dimensional detection results. CaLiJD creatively integrates advanced channel attention mechanisms with a novel bounding-box filtering method to improve detection accuracy, especially for small and occluded objects. Bounding boxes are detected by the 2D and 3D networks for the same object in the same scene as an associated pair. The detection results that satisfy the criteria are then fed into the fusion layer for training. In this study, a novel fusion network is proposed. It consists of numerous convolutions arranged in both sequential and parallel forms and includes a Grouped Channel Attention Module for extracting interactions among multi-channel information. Moreover, a novel bounding-box filtering mechanism was introduced, incorporating the normalized distance from the object to the radar as a filtering criterion within the process. Experiments were conducted using the KITTI 3D object detection benchmark. The results showed that a substantial improvement in mean Average Precision (mAP) was achieved by CaLiJD compared with the baseline single-modal 3D detection model, with an enhancement of 7.54%. Moreover, the improvement achieved by our method surpasses that of other classical fusion networks by an additional 0.82%. In particular, CaLiJD achieved mAP values of 73.04% and 59.86%, respectively, thus demonstrating state-of-the-art performance for challenging small-object detection tasks such as those involving cyclists and pedestrians.<\/jats:p>","DOI":"10.3390\/rs16234593","type":"journal-article","created":{"date-parts":[[2024,12,9]],"date-time":"2024-12-09T06:18:03Z","timestamp":1733725083000},"page":"4593","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["CaLiJD: Camera and LiDAR Joint Contender for 3D Object Detection"],"prefix":"10.3390","volume":"16","author":[{"given":"Jiahang","family":"Lyu","sequence":"first","affiliation":[{"name":"School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China"}]},{"given":"Yongze","family":"Qi","sequence":"additional","affiliation":[{"name":"School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China"}]},{"given":"Suilian","family":"You","sequence":"additional","affiliation":[{"name":"School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China"}]},{"given":"Jin","family":"Meng","sequence":"additional","affiliation":[{"name":"School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China"}]},{"given":"Xin","family":"Meng","sequence":"additional","affiliation":[{"name":"School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China"}]},{"given":"Sarath","family":"Kodagoda","sequence":"additional","affiliation":[{"name":"Faculty of Engineering & Information Technology, University of Technology Sydney, Sydney, NWS 2007, Australia"}]},{"given":"Shifeng","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun 130022, China"},{"name":"Zhongshan Institute of Changchun University of Science and Technology, Zhongshan 528400, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,12,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The kitti vision benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3782","DOI":"10.1109\/TITS.2019.2892405","article-title":"A survey on 3d object detection methods for autonomous driving applications","volume":"20","author":"Arnold","year":"2019","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3781","DOI":"10.1109\/TIV.2023.3264658","article-title":"Multi-modal 3d object detection in autonomous driving: A survey and taxonomy","volume":"8","author":"Wang","year":"2023","journal-title":"IEEE Tran. Intell. Veh."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"108796","DOI":"10.1016\/j.patcog.2022.108796","article-title":"3D Object Detection for Autonomous Driving: A Survey","volume":"130","author":"Qian","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1007\/s11263-023-01784-z","article-title":"Multi-modal 3d object detection in autonomous driving: A survey","volume":"131","author":"Wang","year":"2023","journal-title":"Int. J. Comput. Vis."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhou, Y., He, Y., Zhu, H., Wang, C., Li, H., and Jiang, Q. (2021, January 19\u201325). Monocular 3d object detection: An extrinsic parameter free approach. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Online.","DOI":"10.1109\/CVPR46437.2021.00747"},{"key":"ref_7","unstructured":"Chong, Z., Ma, X., Zhang, H., Yue, Y., Li, H., Wang, Z., and Ouyang, W. (2022, January 25\u201329). Monodistill: Learning spatial features for monocular 3d object detection. Proceedings of the International Conference on Learning Representations (ICLR), Online."},{"key":"ref_8","unstructured":"You, Y., Wang, Y., Garg, D., Pleiss, G., Hariharan, B., and Campbell, M. (May, January 30). Pseudolidar++: Accurate depth for 3d object detection in autonomous driving. Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sun, J., Chen, L., Xie, Y., Zhang, S., Jiang, Q., Zhou, X., and Bao, H. (2020, January 16\u201320). Disp r-cnn: Stereo 3d object detection via shape prior guided instance disparity estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01056"},{"key":"ref_10","unstructured":"Liu, Y., Wang, L., and Liu, M. (June, January 30). Yolostereo3d: A step back to 2d for efffcient stereo 3d detection. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Li, Y., Bao, H., Ge, Z., Yang, J., Sun, J., and Li, Z. (2023, January 7\u201311). Bevstereo: Enhancing depth estimation in multi-view 3d object detection with temporal stereo. Proceedings of the Annual AAAI Conference on Association for the Advancement of Artificial Intelligence (AAAI), Hawaii, HI, USA.","DOI":"10.1609\/aaai.v37i2.25234"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Li, Y., Yang, J., Sun, J., Bao, H., Ge, Z., and Xiao, L. (2023). Bevstereo++: Accurate depth estimation in multi-view 3d object detection via dynamic temporal stereo. arXiv.","DOI":"10.1609\/aaai.v37i2.25234"},{"key":"ref_13","unstructured":"Liu, Z., Tang, H., and Lin, Y. (2019, January 8\u201312). Point-voxel cnn for efffcient 3d deep learning. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201322). VoxelNet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"3337","DOI":"10.3390\/s18103337","article-title":"Second: Sparsely embed-ded convolutional detection","volume":"18","author":"Yan","year":"2018","journal-title":"Sense"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Shi, S., Li, H., Deng, J., Wang, Z., Guo, C., Shi, J., and Wang, X. (2021). PV-RCNN++:Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection. arXiv.","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"ref_17","unstructured":"Chen, Y., Liu, S., Shen, X., and Jia, J. (November, January 27). Fast point r-cnn. Proceedings of the IEEE\/CVF International Conference on Computer Vision(ICCV), Seoul, Republic of Korea."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Li, Z., Wang, F., and Wang, N. (2021, January 19\u201325). Lidar r-cnn: An efffcient and universal 3d object detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Online.","DOI":"10.1109\/CVPR46437.2021.00746"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 16\u201320). Pointpainting: Se-quential fusion for 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00466"},{"key":"ref_21","unstructured":"Liu, Z., Tang, H., Amini, A., Yang, X., and Mao, H. (June, January 29). Bevfusion: Multi-task multi-sensor fusion with unified bird\u2019s-eye view representation. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, UK."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Li, Y., Yu, A., Meng, T., Ben, C., and Ngiam, J. (2022, January 26). Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection. Proceedings of the IEEE\/RSJ Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan.","DOI":"10.1109\/CVPR52688.2022.01667"},{"key":"ref_23","unstructured":"Huang, J., Ye, Y., Liang, Z., Shan, Y., and Du, D. (October, January 29). Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection. Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.3390\/app9061065","article-title":"A Novel Interactive Fusion Method with Images and Point Clouds for 3D Object Detection","volume":"9","author":"Xu","year":"2019","journal-title":"Appl. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.3390\/app14041348","article-title":"Multi-Layer Fusion 3D Object Detection via Lidar Point Cloud and Camera Image","volume":"14","author":"Guo","year":"2024","journal-title":"Appl. Sci."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"7243","DOI":"10.3390\/s20247243","article-title":"Cascaded Cross-Modality Fusion Network for 3D Object Detection","volume":"20","author":"Chen","year":"2020","journal-title":"Sensors"},{"key":"ref_27","unstructured":"Yang, Q., Liu, F., Qu, J., Jing, H., Kuang, B., and Chai, W. (2021, January 3\u20135). Multi-sensor fusion of sparse point clouds based on neuralnet works. Proceedings of the International Conference on Robotics, Intelligent Control and Artificial Intelligence(RICAI), Guilin, China."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1483","DOI":"10.1109\/TPAMI.2019.2956516","article-title":"Cascade r-cnn: High quality object detection and instance segmentation","volume":"43","author":"Cai","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Varghese, R., and S, M. (2024, January 18\u201319). YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceeding of the International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India.","DOI":"10.1109\/ADICS58448.2024.10533619"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Simon, M., Milz, S., and Amende, K. (2018). Complex-YOLO: Real-time 3D Object Detection on Point Clouds. arXiv.","DOI":"10.1109\/CVPRW.2019.00158"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lu, Y., Hao, X., Li, Y., Chai, W., Sun, S., and Velipasalar, S. (2024). Range-Aware Attention Network for LiDAR-Based 3D Object Detection With Auxiliary Point Density Level Estimation. IEEE Transactions on Vehicular Technology, IEEE.","DOI":"10.1109\/TVT.2024.3454607"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Lang, A., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beibom, O. (2019, January 16\u201321). PointPillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Shi, W., and Rajkumar, R. (2020, January 16\u201320). Point-GNN: Graph neural network for 3D object detection in a point cloud. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00178"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1364\/JOSAA.511948","article-title":"EFNet: Enhancing feature information for 3D object detection in LiDAR point clouds","volume":"4","author":"Meng","year":"2024","journal-title":"J. Opt. Soc. A."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., and Jia, J. (2020, January 16\u201320). 3DSSD: Point-Based 3D Single Stage Object Detector. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01105"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Shi, S., Guo, C., Jiang, L., Wang, Z., and Shi, J. (2020, January 16\u201320). Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 16\u201321). Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Pan, X., Xia, Z., Song, S., Li, E., and Huang, G. (2021, January 19\u201325). 3D Object Detection with Pointformer. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Online.","DOI":"10.1109\/CVPR46437.2021.00738"},{"key":"ref_41","unstructured":"Vishwanath, A., Zhou, Y., and Oncel, T. (2019, January 20\u201324). MVX-Net: Multimodal VoxelNet for 3D Object Detection. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wang, Z., and Jia, K. (2019, January 3\u20138). Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. Proceedings of the IEEE\/RSJ Conference on Intelligent Robots and Systems (IROS), Macau, China.","DOI":"10.1109\/IROS40897.2019.8968513"},{"key":"ref_43","unstructured":"Jin, H., Kim, Y., Kim, J., and Choi, W. (October, January 29). 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, Scotland."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Qi, C., Liu, W., Wu, C., Su, H., and Leonidas, J. (2018, January 18\u201322). Frustum PointNets for 3D Object Detection from RGB-D Data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_45","unstructured":"Liang, M., Yang, B., Wang, S., and Urtasun, R. (2018, January 18\u201322). Deep continuous fusion for multi-sensor 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Xie, L., Xiang, C., and Yu, Z. (2020, January 7\u201312). PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6933"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Shen, Y., Li, H., Zhao, X., Yang, M., Tan, W., Pu, S., and Mao, H. (2022, January 24\u201327). Maff-net: Filter false positive for 3d vehicle detection with multi-modal adaptive feature fusion. Proceedings of the IEEE International Conference on Intelligent Transportation(ICITS) Systems, Xiamen, China.","DOI":"10.1109\/ITSC55140.2022.9922104"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Huang, T., Liu, Z., Chen, X., and Bai, X. (2020, January 16\u201320). Epnet: Enhancing point features with image semantics for 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1007\/978-3-030-58555-6_3"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Pang, S., Morris, D., and Radha, H. (2020, January 24). Clocs: Camera-lidar object candidates fusion for 3d object detection. Proceedings of the IEEE\/RSJ Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA.","DOI":"10.1109\/IROS45743.2020.9341791"},{"key":"ref_50","unstructured":"Yang, Z., Sun, Y., Liu, S., Shen, X., and Jia, J. (November, January 27). STD: Sparse-to-dense 3d object detector for point cloud. Proceedings of the IEEE\/CVF International Conference on Computer Vision(ICCV), Seoul, Republic of Korea."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Shu, L., Shen, X., and Jia, J. (2018). IPOD: Intensive Point-based Object Detector for Point Cloud. arXiv.","DOI":"10.1109\/ICCV.2019.00204"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S. (2018, January 1\u20135). Joint 3D proposal generation and object detection from view aggregation. Proceedings of the IEEE\/RSJ Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594049"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/23\/4593\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:49:05Z","timestamp":1760114945000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/23\/4593"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,6]]},"references-count":52,"journal-issue":{"issue":"23","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["rs16234593"],"URL":"https:\/\/doi.org\/10.3390\/rs16234593","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2024,12,6]]}}}