{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T09:44:03Z","timestamp":1779356643903,"version":"3.51.4"},"reference-count":40,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,3,6]],"date-time":"2023-03-06T00:00:00Z","timestamp":1678060800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Youth Innovation Promotion Association, CAS"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>A multi-modal 3D object-detection method, based on data from cameras and LiDAR, has become a subject of research interest. PointPainting proposes a method for improving point-cloud-based 3D object detectors using semantic information from RGB images. However, this method still needs to improve on the following two complications: first, there are faulty parts in the image semantic segmentation results, leading to false detections. Second, the commonly used anchor assigner only considers the intersection over union (IoU) between the anchors and ground truth boxes, meaning that some anchors contain few target LiDAR points assigned as positive anchors. In this paper, three improvements are suggested to address these complications. Specifically, a novel weighting strategy is proposed for each anchor in the classification loss. This enables the detector to pay more attention to anchors containing inaccurate semantic information. Then, SegIoU, which incorporates semantic information, instead of IoU, is proposed for the anchor assignment. SegIoU measures the similarity of the semantic information between each anchor and ground truth box, avoiding the defective anchor assignments mentioned above. In addition, a dual-attention module is introduced to enhance the voxelized point cloud. The experiments demonstrate that the proposed modules obtained significant improvements in various methods, consisting of single-stage PointPillars, two-stage SECOND-IoU, anchor-base SECOND, and an anchor-free CenterPoint on the KITTI dataset.<\/jats:p>","DOI":"10.3390\/s23052868","type":"journal-article","created":{"date-parts":[[2023,3,7]],"date-time":"2023-03-07T01:43:35Z","timestamp":1678153415000},"page":"2868","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["PointPainting: 3D Object Detection Aided by Semantic Image Information"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2926-8620","authenticated-orcid":false,"given":"Zhentong","family":"Gao","sequence":"first","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1266-0324","authenticated-orcid":false,"given":"Qiantong","family":"Wang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5041-3300","authenticated-orcid":false,"given":"Zongxu","family":"Pan","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2998-4333","authenticated-orcid":false,"given":"Zhenyu","family":"Zhai","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hui","family":"Long","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Yoo, J.H., Kim, Y., Kim, J., and Choi, J.W. (2020, January 23\u201328). 3d-cvf: Generating joint camera and LiDAR features using cross-view spatial feature fusion for 3d object detection. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58583-9_43"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Xie, L., Xiang, C., Yu, Z., Xu, G., Yang, Z., Cai, D., and He, X. (2020, January 7\u201312). PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. Proceedings of the AAAI conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6933"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Huang, T., Liu, Z., Chen, X., and Bai, X. (2020, January 23\u201328). Epnet: Enhancing point features with image semantics for 3d object detection. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58555-6_3"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Pang, S., Morris, D., and Radha, H. (2020, January 24\u201329). CLOCs: Camera-LiDAR object candidates fusion for 3D object detection. Proceedings of the 2020 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA.","DOI":"10.1109\/IROS45743.2020.9341791"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zheng, W., Tang, W., Jiang, L., and Fu, C.W. (2021, January 20\u201325). SE-SSD: Self-ensembling single-stage object detector from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01426"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., and Li, H. (2020, January 14\u201319). Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The kitti vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 14\u201319). Pointpainting: Sequential fusion for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00466"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_10","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst., 28."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201323). Frustum pointnets for 3d object detection from rgb-d data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, Z., and Jia, K. (2019, January 3\u20138). Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. Proceedings of the 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China.","DOI":"10.1109\/IROS40897.2019.8968513"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Shin, K., Kwon, Y.P., and Tomizuka, M. (2019, January 9\u201312). Roarnet: A robust 3d object detection based on region approximation refinement. Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France.","DOI":"10.1109\/IVS.2019.8813895"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Paigwar, A., Sierra-Gonzalez, D., Erkent, \u00d6., and Laugier, C. (2021, January 10\u201317). Frustum-pointpillars: A multi-stage approach for 3d object detection using rgb camera and lidar. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00327"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Du, X., Ang, M.H., Karaman, S., and Rus, D. (2018, January 21\u201325). A general pipeline for 3d detection of vehicles. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461232"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Xu, S., Zhou, D., Fang, J., Yin, J., Bin, Z., and Zhang, L. (2021, January 19\u201322). Fusionpainting: Multimodal fusion with adaptive attention for 3d object detection. Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA.","DOI":"10.1109\/ITSC48978.2021.9564951"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Simon, M., Amende, K., Kraus, A., Honer, J., Samann, T., Kaulbersch, H., Milz, S., and Michael Gross, H. (2019, January 16\u201317). Complexer-yolo: Real-time 3d object detection and tracking on semantic point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00158"},{"key":"ref_18","first-page":"16494","article-title":"Multimodal virtual point 3d detection","volume":"34","author":"Yin","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_19","unstructured":"Mao, J., Shi, S., Wang, X., and Li, H. (2022). 3D object detection for autonomous driving: A review and new outlooks. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liang, M., Yang, B., Chen, Y., Hu, R., and Urtasun, R. (2019, January 15\u201320). Multi-task multi-sensor fusion for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00752"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., Zhou, Y., and Tuzel, O. (2019, January 20\u201324). MVX-Net: Multimodal VoxelNet for 3D Object Detection. Proceedings of the International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794195"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Li, Y., Yu, A.W., Meng, T., Caine, B., Ngiam, J., Peng, D., Shen, J., Lu, Y., Zhou, D., and Le, Q.V. (2022, January 18\u201324). Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01667"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Chen, J., and Huang, D. (2022, January 18\u201324). Cat-det: Contrastively augmented transformer for multi-modal 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00098"},{"key":"ref_24","unstructured":"Chen, X., Zhang, T., Wang, Y., Wang, Y., and Zhao, H. (2022). Futr3d: A unified sensor fusion framework for 3d detection. arXiv."},{"key":"ref_25","unstructured":"Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D., and Han, S. (2022). BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird\u2019s-Eye View Representation. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Li, Y., Qi, X., Chen, Y., Wang, L., Li, Z., Sun, J., and Jia, J. (2022, January 18\u201324). Voxel field fusion for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00119"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Bai, X., Hu, Z., Zhu, X., Huang, Q., Chen, Y., Fu, H., and Tai, C.L. (2022, January 18\u201324). Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00116"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wang, C., Ma, C., Zhu, M., and Yang, X. (2021, January 20\u201325). Pointaugmenting: Cross-modal augmentation for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01162"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Xu, D., Anguelov, D., and Jain, A. (2018, January 18\u201323). Pointfusion: Deep sensor fusion for 3d bounding box estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00033"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S.L. (2018, January 1\u20135). Joint 3d proposal generation and object detection from view aggregation. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594049"},{"key":"ref_32","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Pang, S., Morris, D., and Radha, H. (2022, January 3\u20138). Fast-CLOCs: Fast camera-LiDAR object candidates fusion for 3D object detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV51458.2022.00380"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 15\u201320). Pointpillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Yin, T., Zhou, X., and Krahenbuhl, P. (2021, January 20\u201325). Center-based 3d object detection and tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01161"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201323). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 17\u201324). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"3337","DOI":"10.3390\/s18103337","article-title":"Second: Sparsely embedded convolutional detection","volume":"18","author":"Yan","year":"2018","journal-title":"Sensors"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2868\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:49:10Z","timestamp":1760122150000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2868"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,6]]},"references-count":40,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["s23052868"],"URL":"https:\/\/doi.org\/10.3390\/s23052868","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,6]]}}}