{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T16:41:39Z","timestamp":1781714499775,"version":"3.54.5"},"reference-count":51,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2023,10,26]],"date-time":"2023-10-26T00:00:00Z","timestamp":1698278400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"research and development of intelligent vehicle key technologies and industrialization projects based on new-energy vehicles","award":["TC210H02S"],"award-info":[{"award-number":["TC210H02S"]}]},{"name":"research and development of intelligent vehicle key technologies and industrialization projects based on new-energy vehicles","award":["20220301012GX"],"award-info":[{"award-number":["20220301012GX"]}]},{"name":"quantitative development and measurement technology research of expected functional safety based on vehicle\u2013cloud collaboration","award":["TC210H02S"],"award-info":[{"award-number":["TC210H02S"]}]},{"name":"quantitative development and measurement technology research of expected functional safety based on vehicle\u2013cloud collaboration","award":["20220301012GX"],"award-info":[{"award-number":["20220301012GX"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This paper proposes a multimodal fusion 3D target detection algorithm based on the attention mechanism to improve the performance of 3D target detection. The algorithm utilizes point cloud data and information from the camera. For image feature extraction, the ResNet50 + FPN architecture extracts features at four levels. Point cloud feature extraction employs the voxel method and FCN to extract point and voxel features. The fusion of image and point cloud features is achieved through regional point fusion and voxel fusion methods. After information fusion, the Coordinate and SimAM attention mechanisms extract fusion features at a deep level. The algorithm\u2019s performance is evaluated using the DAIR-V2X dataset. The results show that compared to the Part-A2 algorithm; the proposed algorithm improves the mAP value by 7.9% in the BEV view and 7.8% in the 3D view at IOU = 0.5 (cars) and IOU = 0.25 (pedestrians and cyclists). At IOU = 0.7 (cars) and IOU = 0.5 (pedestrians and cyclists), the mAP value of the SECOND algorithm is improved by 5.4% in the BEV view and 4.3% in the 3D view, compared to other comparison algorithms.<\/jats:p>","DOI":"10.3390\/s23218732","type":"journal-article","created":{"date-parts":[[2023,10,26]],"date-time":"2023-10-26T07:22:15Z","timestamp":1698304935000},"page":"8732","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Multiattention Mechanism 3D Object Detection Algorithm Based on RGB and LiDAR Fusion for Intelligent Driving"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-3674-4018","authenticated-orcid":false,"given":"Xiucai","family":"Zhang","sequence":"first","affiliation":[{"name":"State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lei","family":"He","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Junyi","family":"Chen","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Baoyun","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuhai","family":"Wang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuanle","family":"Zhou","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,10,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Liang, M., Yang, B., Wang, S., and Urtasun, R. (2018, January 8\u201314). Deep Continuous Fusion for Multi-sensor 3D Object Detection. Proceedings of the Computer Vision\u2014ECCV 2018\u201415th European Conference, Munich, Germany.","DOI":"10.1007\/978-3-030-01270-0_39"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 15\u201320). PointPillars: Fast Encoders for Object Detection From Point Clouds. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S.K., Girshick, R.B., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unifified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Yang, B., Luo, W., and Urtasun, R. (2018, January 18\u201322). PIXOR: Real-Time 3D Object Detection From Point Clouds. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00798"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Wang, Z., and Jia, K. (2019, January 3\u20138). Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. Proceedings of the 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China.","DOI":"10.1109\/IROS40897.2019.8968513"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Garcia-Garcia, A., Gomez-Donoso, F., Garcia-Rodriguez, J., Orts-Escolano, S., Cazorla, M., and Azorin-Lopez, J. (2016, January 24\u201329). Pointnet: A 3d Convolutional Neural Network for Real-Time Object Class Recognition. Proceedings of the 2016 International Joint Conference on Neural Networks, Vancouver, BC, Canada.","DOI":"10.1109\/IJCNN.2016.7727386"},{"key":"ref_11","unstructured":"Wang, D.Z., and Posner, I. (2015, January 13\u201317). Voting for Voting in Online Point Cloud Object Detection. Proceedings of the Robotics: Science and Systems XI, Sapienza University of Rome, Rome, Italy."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1023\/A:1018628609742","article-title":"Least Squares Support Vector Machine Classifiers","volume":"9","author":"Suykens","year":"1999","journal-title":"Neural Process. Lett."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., and Posner, I. (June, January 29). Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA 2017), Singapore.","DOI":"10.1109\/ICRA.2017.7989161"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., and Paluri, M. (2015, January 11\u201318). Learning Spatiotemporal Features with 3D Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision 2015, Las Condes, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, B., Zhang, T., and Xia, T. (2016). Vehicle detection from 3d lidar using fully convolutional network. arXiv.","DOI":"10.15607\/RSS.2016.XII.042"},{"key":"ref_16","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). PointNet: Deep Learning on Point Sets for 3D Classifification and Segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201322). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 15\u201320). PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_20","unstructured":"Shi, S., Wang, Z., Wang, X., and Li, H. (2019). Part-a2 net: 3d part-aware and aggregation neural network for object detection from point cloud. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Song, S., and Xiao, J. (2016, January 27\u201330). Deep sliding shapes for amodal 3D object detection in rgb-d images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.94"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Xiang, Y., Choi, W., Lin, Y., and Savarese, S. (2015, January 7\u201312). Data-driven 3d voxel patterns for object category recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298800"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"4256","DOI":"10.1109\/TCYB.2019.2933224","article-title":"Real-World ISAR Object Recognition Using Deep Multimodal Relation Learning","volume":"50","author":"Xue","year":"2020","journal-title":"IEEE Trans. Cybern."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"633","DOI":"10.1007\/s00138-011-0391-3","article-title":"Multi-View Traffific Sign Detection, Recognition, and 3D Localisation","volume":"25","author":"Timofte","year":"2014","journal-title":"Mach. Vis. Appl."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, B., Ouyang, W., Sheng, L., Zeng, X., and Wang, X. (2019, January 16\u201320). GS3D: An Effificient 3D Object Detection Framework for Autonomous Driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00111"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Mousavian, A., Anguelov, D., Flynn, J., and Ko\u0161eck\u00e1, J. (2017, January 21\u201326). 3D bounding box estimation using deep learning and geometry. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.597"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., and Chateau, T. (2017, January 21\u201326). Deep MANTA: A Coarse-to-fifine Many-task Network for Joint 2D and 3D Vehicle Analysis from Monocular Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.198"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhu, M., Derpanis, K.G., Yang, Y., Brahmbhatt, S., Zhang, M., Phillips, C., Lecce, M., and Daniilidis, K. (June, January 31). Single Image 3D Object Detection and Pose Estimation for Grasping. Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA 2014), Hong Kong, China.","DOI":"10.1109\/ICRA.2014.6907430"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Manhardt, F., Kehl, W., and Gaidon, A. (2019, January 16\u201320). ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00217"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., and Urtasun, R. (2016, January 27\u201330). Monocular 3D Object Detection for Autonomous Driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.236"},{"key":"ref_31","unstructured":"Chen, X., Kundu, K., Zhu, Y., Berneshawi, A.G., Ma, H., Fidler, S., and Urtasun, R. (2015). Advances in Neural Information Processing Systems, NIPS."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Li, P., Chen, X., and Shen, S. (2019, January 16\u201320). Stereo r-cnn based 3d object detection for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00783"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., and Weinberger, K.Q. (2019, January 15\u201320). Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00864"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1023\/A:1007981719186","article-title":"Point signatures: A new representation for 3d object recognition","volume":"25","author":"Chua","year":"1997","journal-title":"Int. J. Comput. Vis."},{"key":"ref_35","unstructured":"Ba, J., Mnih, V., and Kavukcuoglu, K.J. (2014). Multiple object recognition with visual attention. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Liu, Z., Zhao, X., Huang, T., Hu, R., Zhou, Y., and Bai, X. (2020, January 7\u201312). Tanet: Robust 3d object detection from point clouds with triple attention. Proceedings of the AAAI Conference on Artifificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6837"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1115","DOI":"10.1109\/34.625113","article-title":"COSMOS-A representation scheme for 3D free-form objects","volume":"19","author":"Dorai","year":"1997","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Tuzel, O., Liu, M.-Y., Taguchi, Y., and Raghunathan, A. (2014;, January 6\u201312). Learning to rank 3d features. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10590-1_34"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Li, B. (2017, January 24\u201328). 3d fully convolutional network for vehicle detection in point cloud. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8205955"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Premebida, C., Carreira, J., Batista, J., and Nunes, U. (2014, January 14\u201318). Pedestrian detection combining rgb and dense lidar data. Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Chicago, IL, USA.","DOI":"10.1109\/IROS.2014.6943141"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Gonzalez, A., Villalonga, G., Xu, J., Vazquez, D., Amores, J., and Lopez, A. (July, January 28). Multiview random forest of local experts combining rgb and lidar data for pedestrian detection. Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Republic of Korea.","DOI":"10.1109\/IVS.2015.7225711"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201323). Frustum pointnets for 3d object detection from rgb-d data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1391","DOI":"10.1109\/JSEN.2021.3127626","article-title":"RangeLVDet: Boosting 3D Object Detection in LIDAR with Range Image and RGB Image","volume":"22","author":"Zhang","year":"2022","journal-title":"IEEE Sens. J."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zhang, Y., and Wu, H. (2022, January 14\u201316). 3D Object Detection Based on Multi-view Adaptive Fusion. Proceedings of the 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), Dalian, China.","DOI":"10.1109\/IPEC54454.2022.9777488"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozififian, M., Lee, J., Harakeh, A., and Waslander, S. (2017). Joint 3D Proposal Generation and Object Detection from View Aggregation. arXiv.","DOI":"10.1109\/IROS.2018.8594049"},{"key":"ref_46","unstructured":"Rozenberszki, D., Litany, O., and Dai, A. (2023). UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_49","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"He, C., Zeng, H., Huang, J., Hua, X.S., and Zhang, L. (2020, January 13\u201319). Structure aware single-stage 3d object detection from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01189"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 13\u201319). PointPainting: Sequential Fusion for 3D Object Detection. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00466"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/21\/8732\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:12:05Z","timestamp":1760130725000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/21\/8732"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,26]]},"references-count":51,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["s23218732"],"URL":"https:\/\/doi.org\/10.3390\/s23218732","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,26]]}}}