{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:16:14Z","timestamp":1760145374947,"version":"build-2065373602"},"reference-count":48,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T00:00:00Z","timestamp":1721260800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Research Project on Laser object Feature Extraction and Recognition"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Accurate and comprehensive 3D objects detection is important for perception systems in autonomous driving. Nevertheless, contemporary mainstream methods tend to perform more effectively on large objects in regions proximate to the LiDAR, leaving limited exploration of long-range objects and small objects. The divergent point pattern of LiDAR, which results in a reduction in point density as the distance increases, leads to a non-uniform point distribution that is ill-suited to discretized volumetric feature extraction. To address this challenge, we propose the Foreground Voxel Proposal (FVP) module, which effectively locates and generates voxels at the foreground of objects. The outputs are subsequently merged to mitigating the difference in point cloud density and completing the object shape. Furthermore, the susceptibility of small objects to occlusion results in the loss of feature space. To overcome this, we propose the Multi-Scale Feature Integration Network (MsFIN), which captures contextual information at different ranges. Subsequently, the outputs of these features are integrated through a cascade framework based on transformers in order to supplement the object features space. The extensive experimental results demonstrate that our network achieves remarkable results. Remarkably, our approach demonstrated an improvement of 8.56% AP on the SECOND baseline for the Car detection task at a distance of more than 20 m, and 9.38% AP on the Cyclist detection task.<\/jats:p>","DOI":"10.3390\/rs16142631","type":"journal-article","created":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T10:39:26Z","timestamp":1721299166000},"page":"2631","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Spatial Information Enhancement with Multi-Scale Feature Aggregation for Long-Range Object and Small Reflective Area Object Detection from Point Cloud"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-8197-1926","authenticated-orcid":false,"given":"Hanwen","family":"Li","sequence":"first","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huamin","family":"Tao","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0748-6683","authenticated-orcid":false,"given":"Qiuqun","family":"Deng","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shanzhu","family":"Xiao","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianxiong","family":"Zhou","sequence":"additional","affiliation":[{"name":"College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,18]]},"reference":[{"key":"ref_1","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NA, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_3","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 15\u201320). Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_5","first-page":"2647","article-title":"From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network","volume":"43","author":"Shi","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201328). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., and Li, H. (2021, January 2\u20139). Voxel r-cnn: Towards high performance voxel-based 3d object detection. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual.","DOI":"10.1609\/aaai.v35i2.16207"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., and Li, H. (2020, January 14\u201319). Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Washington, DC, USA.","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"108684","DOI":"10.1016\/j.patcog.2022.108684","article-title":"Spatial information enhancement network for 3D object detection from point cloud","volume":"128","author":"Li","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Mao, J., Niu, M., Bai, H., Liang, X., Xu, H., and Xu, C. (2021, January 19\u201325). Pyramid r-cnn: Towards better performance and adaptability for 3d object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual.","DOI":"10.1109\/ICCV48922.2021.00272"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Mao, J., Xue, Y., Niu, M., Bai, H., Feng, J., Liang, X., Xu, H., and Xu, C. (2021, January 10\u201317). Voxel transformer for 3d object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00315"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zheng, W., Tang, W., Jiang, L., and Fu, C.W. (2021, January 19\u201325). SE-SSD: Self-ensembling single-stage object detector from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR46437.2021.01426"},{"key":"ref_14","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Fan, L., Pang, Z., Zhang, T., Wang, Y.X., Zhao, H., Wang, F., Wang, N., and Zhang, Z. (2022, January 19\u201324). Embracing single stride 3d object detector with sparse transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00827"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision meets robotics: The kitti dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_17","unstructured":"Qi, C.R., Litany, O., He, K., and Guibas, L.J. (November, January 27). Deep hough voting for 3d object detection in point clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_18","unstructured":"Yang, Z., Sun, Y., Liu, S., Shen, X., and Jia, J. (November, January 27). Std: Sparse-to-dense 3d object detector for point cloud. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., and Jia, J. (2020, January 14\u201319). 3dssd: Point-based 3d single stage object detector. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Washington, DC, USA.","DOI":"10.1109\/CVPR42600.2020.01105"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Shi, G., Li, R., and Ma, C. (2022, January 24\u201328). Pillarnet: Real-time and high-performance pillar-based 3d object detection. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20080-9_3"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 16\u201317). Pointpillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Corral-Soto, E.R., and Bingbing, L. (November, January 19). Understanding strengths and weaknesses of complementary sensor modalities in early fusion for object detection. Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA.","DOI":"10.1109\/IV47402.2020.9304558"},{"key":"ref_23","unstructured":"Hu, J.S., Kuai, T., and Waslander, S.L. (2022, January 19\u201324). Point density-aware voxels for lidar 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA."},{"key":"ref_24","unstructured":"Arief, H.A., Arief, M., Bhat, M., Indahl, U.G., Tveite, H., and Zhao, D. (2019, January 16\u201317). Density-Adaptive Sampling for Heterogeneous Point Cloud Object Segmentation in Autonomous Vehicle Applications. Proceedings of the CVPR Workshops, Long Beach, CA, USA."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Yuan, W., Khot, T., Held, D., Mertz, C., and Hebert, M. (2018, January 5\u20138). Pcn: Point completion network. Proceedings of the 2018 International Conference on 3D Vision (3DV), Piscataway, NJ, USA.","DOI":"10.1109\/3DV.2018.00088"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Chen, X., Litany, O., and Guibas, L.J. (2020, January 14\u201319). Imvotenet: Boosting 3d object detection in point clouds with image votes. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR42600.2020.00446"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xu, Q., Zhou, Y., Wang, W., Qi, C.R., and Anguelov, D. (2021, January 19\u201325). Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual.","DOI":"10.1109\/ICCV48922.2021.01516"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Tsai, D., Berrio, J.S., Shan, M., Nebot, E., and Worrall, S. (June, January 29). Viewer-centred surface completion for unsupervised domain adaptation in 3D object detection. Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, London, UK.","DOI":"10.1109\/ICRA48891.2023.10160707"},{"key":"ref_29","unstructured":"Chen, X., Chen, B., and Mitra, N.J. (2019). Unpaired point cloud completion on real scans using adversarial training. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, J., Chen, X., Cai, Z., Pan, L., Zhao, H., Yi, S., Yeo, C.K., Dai, B., and Loy, C.C. (2021, January 19\u201325). Unsupervised 3d shape completion through gan inversion. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR46437.2021.00181"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Li, J., Luo, C., and Yang, X. (2023, January 18\u201322). PillarNeXt: Rethinking network designs for 3D object detection in LiDAR point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada.","DOI":"10.1109\/CVPR52729.2023.01685"},{"key":"ref_32","first-page":"11615","article-title":"Mssvt: Mixed-scale sparse voxel transformer for 3d object detection on point clouds","volume":"35","author":"Dong","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s41095-021-0229-5","article-title":"Pct: Point cloud transformer","volume":"7","author":"Guo","year":"2021","journal-title":"Comput. Vis. Media"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhao, H., Jiang, L., Jia, J., Torr, P.H., and Koltun, V. (2021, January 19\u201325). Point transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual.","DOI":"10.1109\/ICCV48922.2021.01595"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"He, C., Li, R., Li, S., and Zhang, L. (2022, January 19\u201324). Voxel set transformer: A set-to-set approach to 3d object detection from point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00823"},{"key":"ref_36","unstructured":"Zhang, C., Wan, H., Liu, S., Shen, X., and Wu, Z. (2021). Pvt: Point-voxel transformer for 3d deep learning. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Park, C., Jeong, Y., Cho, M., and Park, J. (2022, January 19\u201324). Fast point transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01644"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Sheng, H., Cai, S., Liu, Y., Deng, B., Huang, J., Hua, X.S., and Zhao, M.J. (2021, January 19\u201325). Improving 3d object detection with channel-wise transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual.","DOI":"10.1109\/ICCV48922.2021.00274"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo, J., Zhou, Y., Chai, Y., and Caine, B. (2020, January 14\u201319). Scalability in perception for autonomous driving: Waymo open dataset. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR42600.2020.00252"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Chang, M.F., Lambert, J., Sangkloy, P., Singh, J., Bak, S., Hartnett, A., Wang, D., Carr, P., Lucey, S., and Ramanan, D. (2019, January 16\u201317). Argoverse: 3d tracking and forecasting with rich maps. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00895"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Xu, J., Li, Z., Du, B., Zhang, M., and Liu, J. (2020, January 7\u201310). Reluplex made more practical: Leaky ReLU. Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Piscataway, NJ, USA.","DOI":"10.1109\/ISCC50000.2020.9219587"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 21\u201326). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_43","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_44","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_45","unstructured":"Shi, S., Wang, Z., Wang, X., and Li, H. (2019). Part-a2 net: 3d part-aware and aggregation neural network for object detection from point cloud. arXiv."},{"key":"ref_46","first-page":"16494","article-title":"Multimodal virtual point 3d detection","volume":"34","author":"Yin","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1007\/s11263-022-01710-9","article-title":"PV-RCNN++: Point-voxel feature set abstraction with local vector representation for 3D object detection","volume":"131","author":"Shi","year":"2023","journal-title":"Int. J. Comput. Vis."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 19\u201325). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual.","DOI":"10.1109\/ICCV48922.2021.00986"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/14\/2631\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:18:59Z","timestamp":1760109539000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/14\/2631"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,18]]},"references-count":48,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["rs16142631"],"URL":"https:\/\/doi.org\/10.3390\/rs16142631","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2024,7,18]]}}}