{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T16:33:22Z","timestamp":1764174802080,"version":"build-2065373602"},"reference-count":44,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2020,6,11]],"date-time":"2020-06-11T00:00:00Z","timestamp":1591833600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100013061","name":"Jilin Scientific and Technological Development Program","doi-asserted-by":"publisher","award":["JJKH20200780KJ"],"award-info":[{"award-number":["JJKH20200780KJ"]}],"id":[{"id":"10.13039\/501100013061","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>In this paper, we propose a novel 3D object detector KDA3D, which achieves high-precision and robust classification, segmentation, and localization with the help of key-point densification and multi-attention guidance. The proposed end-to-end neural network architecture takes LIDAR point clouds as the main inputs that can be optionally complemented by RGB images. It consists of three parts: part-1 segments 3D foreground points and generates reliable proposals; part-2 (optional) enhances point cloud density and reconstructs the more compact full-point feature map; part-3 refines 3D bounding boxes and adds semantic segmentation as extra supervision. Our designed lightweight point-wise and channel-wise attention modules can adaptively strengthen the \u201cskeleton\u201d and \u201cdistinctiveness\u201d point-features to help feature learning networks capture more representative or finer patterns. The proposed key-point densification component can generate pseudo-point clouds containing target information from monocular images through the distance preference strategy and K-means clustering so as to balance the density distribution and enrich sparse features. Extensive experiments on the KITTI and nuScenes 3D object detection benchmarks show that our KDA3D produces state-of-the-art results while running in near real-time with a low memory footprint.<\/jats:p>","DOI":"10.3390\/rs12111895","type":"journal-article","created":{"date-parts":[[2020,6,15]],"date-time":"2020-06-15T05:56:27Z","timestamp":1592200587000},"page":"1895","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["KDA3D: Key-Point Densification and Multi-Attention Guidance for 3D Object Detection"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0377-8083","authenticated-orcid":false,"given":"Jiarong","family":"Wang","sequence":"first","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"},{"name":"Changchun University of Science and Technology, Changchun 130022, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Ming","family":"Zhu","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"}]},{"given":"Bo","family":"Wang","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Deyao","family":"Sun","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Hua","family":"Wei","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Changji","family":"Liu","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Haitao","family":"Nie","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,6,11]]},"reference":[{"key":"ref_1","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017, January 4\u20139). PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019). PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. (2019). NuScenes: A multimodal dataset for autonomous driving. arXiv.","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). Second: Sparsely Embedded Convolutional Detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_7","unstructured":"Li, B., Zhang, T., and Xia, T. (2016). Vehicle Detection from 3D Lidar Using Fully Convolutional Network. Robot. Sci. Syst. XII."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Yang, B., Luo, W., and Urtasun, R. (2018). Pixor: Real-time 3D Object Detection from Point Clouds. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2018.00798"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Simon, M., Milz, S., Amende, K., and Gross, H.-M. (2019). Complex-YOLO: A Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds. Appl. Evol. Comput., 197\u2013209.","DOI":"10.1007\/978-3-030-11009-3_11"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, Z., Shi, J., Wang, X., and Li, H. (2020). From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network. IEEE Trans. Pattern Anal. Mach. Intell., 1.","DOI":"10.1109\/TPAMI.2020.2977026"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019). Pointpillars: Fast Encoders for Object Detection from Point Clouds. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Charles, R.Q., Su, H., Kaichun, M., and Guibas, L.J. (2017). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2017.16"},{"key":"ref_15","unstructured":"Li, Y., Bu, R., Sun, M., Wu, W., Di, X., and Chen, B. (2018, January 3\u20138). Pointcnn: Convolution on x-transformed points. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., Shen, X., and Jia, J. (2019). STD: Sparse-to-Dense 3D Object Detector for Point Cloud. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/ICCV.2019.00204"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., and Jia, J. (2019). 3DSSD: Point-based 3D Single Stage Object Detector. arXiv.","DOI":"10.1109\/CVPR42600.2020.01105"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018). Frustum PointNets for 3D Object Detection from RGB-D Data. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Shin, K., Kwon, Y.P., and Tomizuka, M. (2019, January 9\u201312). RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement. 2019 IEEE Intelligent Vehicles Symposium, Paris, France.","DOI":"10.1109\/IVS.2019.8813895"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wang, Z., and Jia, K. (2019). Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection. arXiv, 1742\u20131749.","DOI":"10.1109\/IROS40897.2019.8968513"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017). Multi-view 3D Object Detection Network for Autonomous Driving. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S.L. (2018). Joint 3D Proposal Generation and Object Detection from View Aggregation. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/IROS.2018.8594049"},{"key":"ref_24","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liang, M., Yang, B., Wang, S., and Urtasun, R. (2018). Deep Continuous Fusion for Multi-sensor 3D Object Detection. Proceedings of the Applications of Evolutionary Computation, Springer Science and Business Media LLC.","DOI":"10.1007\/978-3-030-01270-0_39"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Liang, M., Yang, B., Chen, Y., Hu, R., and Urtasun, R. (2019). Multi-Task Multi-Sensor Fusion for 3D Object Detection. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2019.00752"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xie, L., Xiang, C., Yu, Z., Xu, G., Yang, Z., Cai, D., and He, X. (2019). PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module. arXiv.","DOI":"10.1609\/aaai.v34i07.6933"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2016, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"18006","DOI":"10.1038\/lsa.2018.6","article-title":"A real-time detection and positioning method for small and weak targets using a 1D morphology-based approach in 2D images","volume":"7","author":"Wei","year":"2018","journal-title":"Light. Sci. Appl."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"90801","DOI":"10.1109\/ACCESS.2019.2927012","article-title":"MCF3D: Multi-Stage Complementary Fusion for Multi-Sensor 3D Object Detection","volume":"7","author":"Wang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., Shen, X., and Jia, J. (2018). IPOD: Intensive Point-based Object Detector for Point Cloud. arXiv.","DOI":"10.1109\/ICCV.2019.00204"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Jiang, M., Wu, Y., and Lu, C. (2018). PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation. arXiv.","DOI":"10.1109\/IGARSS.2019.8900102"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018). Squeeze-and-Excitation Networks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018). CBAM: Convolutional Block Attention Module. Proceedings of the Applications of Evolutionary Computation, Springer Science and Business Media LLC.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., and Tang, X. (2017). Residual Attention Network for Image Classification. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2017.683"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., and Chua, T.-S. (2017). SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/CVPR.2017.667"},{"key":"ref_37","unstructured":"Nair, V., and Hinton, G.E. (2010, January 21\u201324). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th international conference on machine learning (ICML-10), Haifa, Israel."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1109\/3477.764879","article-title":"Genetic K-means algorithm","volume":"29","author":"Krishna","year":"1999","journal-title":"IEEE Trans. Syst. Man Cybern. Part B"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","article-title":"Data clustering: 50 years beyond K-means","volume":"31","author":"Jain","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., and Urtasun, R. (2016, January 20\u201325). Monocular 3D Object Detection for Autonomous Driving. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR.2016.236"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1259","DOI":"10.1109\/TPAMI.2017.2706685","article-title":"3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection","volume":"40","author":"Ma","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_42","unstructured":"Kingma, D.P., and Ba, J.J.A. (2014). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_44","unstructured":"Yoo, J.H., Kim, Y., Kim, J.S., and Choi, J.W. (2004). 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection. arXiv."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/11\/1895\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:37:54Z","timestamp":1760175474000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/12\/11\/1895"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,11]]},"references-count":44,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,6]]}},"alternative-id":["rs12111895"],"URL":"https:\/\/doi.org\/10.3390\/rs12111895","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2020,6,11]]}}}