{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:58:59Z","timestamp":1760151539395,"version":"build-2065373602"},"reference-count":37,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,3,26]],"date-time":"2022-03-26T00:00:00Z","timestamp":1648252800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>3D object detection with LiDAR and camera fusion has always been a challenge for autonomous driving. This work proposes a deep neural network (namely FuDNN) for LiDAR\u2013camera fusion 3D object detection. Firstly, a 2D backbone is designed to extract features from camera images. Secondly, an attention-based fusion sub-network is designed to fuse the features extracted by the 2D backbone and the features extracted from 3D LiDAR point clouds by PointNet++. Besides, the FuDNN, which uses the RPN and the refinement work of PointRCNN to obtain 3D box predictions, was tested on the public KITTI dataset. Experiments on the KITTI validation set show that the proposed FuDNN achieves AP values of 92.48, 82.90, and 80.51 at easy, moderate, and hard difficulty levels for car detection. The proposed FuDNN improves the performance of LiDAR\u2013camera fusion 3D object detection in the car category of the public KITTI dataset.<\/jats:p>","DOI":"10.3390\/info13040169","type":"journal-article","created":{"date-parts":[[2022,3,27]],"date-time":"2022-03-27T21:27:51Z","timestamp":1648416471000},"page":"169","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":25,"title":["A LiDAR\u2013Camera Fusion 3D Object Detection Algorithm"],"prefix":"10.3390","volume":"13","author":[{"given":"Leyuan","family":"Liu","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"}]},{"given":"Jian","family":"He","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"},{"name":"Beijing Engineering Research Center for IOT Software and Systems, Beijing University of Technology, Beijing 100124, China"}]},{"given":"Keyan","family":"Ren","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"},{"name":"Beijing Engineering Research Center for IOT Software and Systems, Beijing University of Technology, Beijing 100124, China"}]},{"given":"Zhonghua","family":"Xiao","sequence":"additional","affiliation":[{"name":"Suzhou Exinova Robot Technology Co., Ltd., Suzhou 215163, China"}]},{"given":"Yibin","family":"Hou","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China"},{"name":"Beijing Engineering Research Center for IOT Software and Systems, Beijing University of Technology, Beijing 100124, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 16\u201317). Pointpillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_2","unstructured":"Wang, Y., Mao, Q., Zhu, H., Zhang, Y., Ji, J., and Zhang, Y. (2021). Multi-modal 3d object detection in autonomous driving: A survey. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_5","first-page":"91","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 21\u201326). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"103295","DOI":"10.1016\/j.cviu.2021.103295","article-title":"Deep structural information fusion for 3D object detection on LiDAR\u2013camera system","volume":"214","author":"An","year":"2022","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Reading, C., Harakeh, A., Chae, J., and Waslander, S.L. (2021, January 20\u201325). Categorical depth distribution network for monocular 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00845"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lu, Y., Ma, X., Yang, L., Zhang, T., Liu, Y., Chu, Q., Yan, J., and Ouyang, W. (2021, January 11\u201317). Geometry uncertainty projection network for monocular 3d object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00310"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"4338","DOI":"10.1109\/TPAMI.2020.3005434","article-title":"Deep learning for 3d point clouds: A survey","volume":"43","author":"Guo","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201323). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., and Li, H. (2020, January 13\u201319). Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"ref_15","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_16","first-page":"5105","article-title":"Pointnet++: Deep hierarchical feature learning on point sets in a metric space","volume":"30","author":"Qi","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 15\u201320). Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., Shen, X., and Jia, J. (2019, January 27\u201328). Std: Sparse-to-dense 3d object detector for point cloud. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00204"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Shi, W., and Rajkumar, R. (2020, January 13\u201319). Point-gnn: Graph neural network for 3d object detection in a point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00178"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Mao, J., Niu, M., Bai, H., Liang, X., Xu, H., and Xu, C. (2021, January 11\u201317). Pyramid r-cnn: Towards better performance and adaptability for 3d object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00272"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"22080","DOI":"10.1109\/ACCESS.2021.3055491","article-title":"Fast and accurate 3D object detection for lidar-camera-based autonomous vehicles using one shared voxel-based backbone","volume":"9","author":"Wen","year":"2021","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lu, H., Chen, X., Zhang, G., Zhou, Q., Ma, Y., and Zhao, Y. (2019, January 12\u201317). SCANet: Spatial-channel attention network for 3D object detection. Proceedings of the ICASSP 2019\u20132019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682746"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liang, M., Yang, B., Wang, S., and Urtasun, R. (2018, January 8\u201314). Deep continuous fusion for multi-sensor 3d object detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01270-0_39"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S.L. (2018, January 1\u20135). Joint 3d proposal generation and object detection from view aggregation. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594049"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201322). Frustum pointnets for 3d object detection from rgb-d data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yoo, J.H., Kim, Y., Kim, J., and Choi, J.W. (2020, January 23\u201328). 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58583-9_43"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Huang, T., Liu, Z., Chen, X., and Bai, X. (2020, January 23\u201328). Epnet: Enhancing point features with image semantics for 3d object detection. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58555-6_3"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Xie, L., Xiang, C., Yu, Z., Xu, G., Yang, Z., Cai, D., and He, X. (2020, January 7\u201312). PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6933"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"6655","DOI":"10.1109\/TII.2020.3048719","article-title":"Three-attention mechanisms for one-stage 3-d object detection based on LiDAR and camera","volume":"17","author":"Wen","year":"2021","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The kitti vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_32","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_34","unstructured":"Da, K. (2014). A method for stochastic optimization. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_36","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/4\/169\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:43:48Z","timestamp":1760136228000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/4\/169"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,26]]},"references-count":37,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["info13040169"],"URL":"https:\/\/doi.org\/10.3390\/info13040169","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2022,3,26]]}}}