{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T18:14:06Z","timestamp":1764785646315,"version":"3.41.0"},"reference-count":54,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T00:00:00Z","timestamp":1749427200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T00:00:00Z","timestamp":1749427200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Anhui Provincial Department of Education Natural Science Key Project","award":["KJ2021A0662"],"award-info":[{"award-number":["KJ2021A0662"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Discov Artif Intell"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Significant progress on 3D object detection in point cloud has been made in the detection of large objects with clear shape and contour information, such as cars. However, existing algorithms still face significant challenges in detecting tiny objects such as cyclists and pedestrians. This paper presents a novel feature fusion method named Point Voxel Local Feature Fusion (PVLF), which deeply integrates point cloud and voxel information. PVLF explores local spatial features to improve accuracy regarding tiny object detection. To address the potential complex computational issues in the convolution process, we have designed an innovative Adaptive Sparse Convolution (ASC) module that effectively eliminates redundant information in the feature layer. Due to long-range dependencies in point cloud feature extraction, the Dynamic Graph Convolution combined with Transformer (DGFormer) is developed as the point cloud feature encoder. DFFormer expands the receptive field to capture contextual information and could hence improve deep representation learning. We also introduce a sector segmentation based sampling strategy and achieves parallel sampling of key points through our designed Adjacency Distance Update Farthest Point Sampling (ADUFPS) algorithm, significantly reducing computational overhead while improve sampling efficiency. The experimental results on the KITTI and Waymo datasets show that our method outperforms, particularly in tiny object detection tasks, the-state-of-the-arts deep models which are also based on the point-voxel feature fusion.<\/jats:p>","DOI":"10.1007\/s44163-025-00299-5","type":"journal-article","created":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T14:33:47Z","timestamp":1749479627000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["PVLF: point-voxel local feature fusion for 3D detection"],"prefix":"10.1007","volume":"5","author":[{"given":"Haowei","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Zhuolei","family":"Xiao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,9]]},"reference":[{"issue":"10","key":"299_CR1","doi-asserted-by":"publisher","first-page":"3337","DOI":"10.3390\/s18103337","volume":"18","author":"Y Yan","year":"2018","unstructured":"Yan Y, Mao Y, Li Bo. Second: Sparsely embedded convolutional detection. Sensors. 2018;18(10):3337.","journal-title":"Sensors"},{"key":"299_CR2","doi-asserted-by":"crossref","unstructured":"Zhou Y, Tuzel O. Voxelnet: end-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, p. 4490\u20134499.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"299_CR3","doi-asserted-by":"crossref","unstructured":"Lang AH, Vora S, Caesar H, Zhou H, Yang J, Beijbom O. Pointpillars: fast encoders for object detection from point clouds. In: CVPR, 2019.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"299_CR4","doi-asserted-by":"crossref","unstructured":"Shi S, Wang X, Li H. Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2019, p. 770\u2013779.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"299_CR5","unstructured":"Qi CR, Yi L, Su H, Guibas LJ. Pointnet++ deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st international conference on neural information processing systems, 2017, p. 5105\u20135114."},{"key":"299_CR6","unstructured":"Qi CR, Su H, Mo K, Guibas LJ. Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, p. 652\u2013660."},{"key":"299_CR7","doi-asserted-by":"crossref","unstructured":"Shi S, Guo C, Jiang J, Wang Z, Shi J, Wang X, Li H. Pvrcnn: point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2020.","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"299_CR8","unstructured":"Shi S, Jiang L, Deng J, Wang Z, Guo C, Shi J, Wang X, Li H. Pvrcnn++: point-voxel feature set abstraction with local vector representation for 3d object detection."},{"key":"299_CR9","doi-asserted-by":"crossref","unstructured":"Yang Z, Sun Y, Liu S, Jia J. 3dssd: point-based 3d single stage object detector. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, 2020, p. 11040\u201311048.","DOI":"10.1109\/CVPR42600.2020.01105"},{"key":"299_CR10","doi-asserted-by":"crossref","unstructured":"Zhang Y, Hu Q, Xu G, MaY, Wan J, Guo Y. Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, p. 18953\u201318962, 2022.","DOI":"10.1109\/CVPR52688.2022.01838"},{"key":"299_CR11","unstructured":"Ma X, Qin C, You H, Ran H, Fu Y. Rethinking network design and local geometry in point cloud: a simple residual mlp framework,\u201d arXiv preprint arXiv:2202.07123, 2022."},{"key":"299_CR12","doi-asserted-by":"crossref","unstructured":"Shi W, Rajkumar R. Point-gnn: graph neural network for 3d object detection in a point cloud. In: IEEE conference on computer vision and pattern recognition (CVPR), 2020, p. 1708\u20131716.","DOI":"10.1109\/CVPR42600.2020.00178"},{"key":"299_CR13","doi-asserted-by":"crossref","unstructured":"Chen X, Ma H, Wan J, Li B, Xia T. Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, p. 1907\u20131915.","DOI":"10.1109\/CVPR.2017.691"},{"key":"299_CR14","first-page":"3555","volume":"35","author":"Wu Zheng","year":"2021","unstructured":"Zheng Wu, Tang W, Chen S, Jiang Li, Chi-Wing Fu. Cia-ssd: confident iou-aware single-stage object detector from point cloud. Proc AAAI Conf Artif Intell. 2021;35:3555\u201362.","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"299_CR15","doi-asserted-by":"crossref","unstructured":"Yang Z, Sun Y, Liu S, Shen X, Jia J. Std: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE\/CVF international conference on computer vision, 2019, p. 1951\u20131960.","DOI":"10.1109\/ICCV.2019.00204"},{"key":"299_CR16","doi-asserted-by":"crossref","unstructured":"Noh J, Lee S, Ham B. Hvpr: Hybrid voxel-point representation for single-stage 3d object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), p. 14605\u201314614, 2021.","DOI":"10.1109\/CVPR46437.2021.01437"},{"key":"299_CR17","doi-asserted-by":"crossref","unstructured":"Jiang T, Song N, Liu H, Yin R, Gong Y, Yao J. Vic-net: voxelization information compensation network for point cloud 3d object detection. In: 2021 IEEE international conference on robotics and automation (ICRA), IEEE, 2021, p. 13408\u201313414.","DOI":"10.1109\/ICRA48506.2021.9561597"},{"key":"299_CR18","unstructured":"He Q, Wang Z, Zeng H, Zeng Y, Liu S, Zeng B. SVGA-Net: sparse voxel-graph attention network for 3D object detection from point clouds. arXiv 2020, arXiv:abs\/2006.04043."},{"key":"299_CR19","doi-asserted-by":"crossref","unstructured":"He C, Zeng H, Huang J, Hua XS, Zhang L. Structure aware singlestage 3d object detection from point cloud. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, 2020, p. 11873\u201311882.","DOI":"10.1109\/CVPR42600.2020.01189"},{"key":"299_CR20","unstructured":"Shi S, Wang Z, Wang X, Li H. Part-a\u02c6 2 net: 3d part-aware and aggregation neural network for object detection from point cloud. arXiv preprint arXiv:1907.03670, 2(3), 2019."},{"key":"299_CR21","doi-asserted-by":"crossref","unstructured":"Wang Z, Jia K. Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal. In: 2019 IEEE\/RSJ international conference on intelligent robots and systems (IROS), IEEE, 2019. p. 1742\u20131749.","DOI":"10.1109\/IROS40897.2019.8968513"},{"key":"299_CR22","unstructured":"OpenPCDet Development Team. Openpcdet: an open-source toolbox for 3d object detection from point clouds. https:\/\/github.com\/open-mmlab\/OpenPCDet, 2020."},{"key":"299_CR23","doi-asserted-by":"crossref","unstructured":"Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, 2012. pp. 3354\u20133361.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"299_CR24","unstructured":"KINGMA DiederikP, BA J. Adam: a method for stochastic optimization. arXiv: Learning, arXiv: Learning, 2014."},{"issue":"9","key":"299_CR25","doi-asserted-by":"publisher","first-page":"1305","DOI":"10.1109\/83.623193","volume":"6","author":"Y Eldar","year":"1997","unstructured":"Eldar Y, Lindenbaum M, Porat M, Zeevi YY. The farthest point strategy for progressive image sampling. IEEE Trans Image Process. 1997;6(9):1305\u201315.","journal-title":"IEEE Trans Image Process"},{"key":"299_CR26","doi-asserted-by":"crossref","unstructured":"Yuxin W, Kaiming H. Group normalization. In: Proceedings of the European conference on computer vision (ECCV), 2018, p. 3\u201319.","DOI":"10.1007\/978-3-030-01261-8_1"},{"key":"299_CR27","unstructured":"Li Y, Rui B, Mingchao S, Wei W, Xinhan D, Baoquan C. Pointcnn: convolution on X-transformed points. In: NIPS, 2018."},{"key":"299_CR28","doi-asserted-by":"crossref","unstructured":"Thomas H, Qi CR, Deschaud JE, Beatriz Marcotegui, Goulette F, Guibas LJ. Kpconv: flexible and deformable convolution for point clouds. In: Proceedings of the IEEE\/CVF international conference on computer vision, 2019, p. 6411\u20136420.","DOI":"10.1109\/ICCV.2019.00651"},{"issue":"12","key":"299_CR29","doi-asserted-by":"publisher","first-page":"3100","DOI":"10.1007\/s11263-022-01682-w","volume":"130","author":"Q Zhang","year":"2022","unstructured":"Zhang Q, Hou J, Qian Y, Chan AB, Zhang J, He Y. Reggeonet: learning regular representations for large-scale 3d point clouds. Int J Comput Vision. 2022;130(12):3100\u201322.","journal-title":"Int J Comput Vision"},{"key":"299_CR30","doi-asserted-by":"crossref","unstructured":"Afham M, Dissanayake I, Dissanayake D, Dharmasiri A, Thilakarathna K, Rodrigo R. Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, 2022, p. 9902\u20139912.","DOI":"10.1109\/CVPR52688.2022.00967"},{"key":"299_CR31","doi-asserted-by":"publisher","first-page":"754","DOI":"10.1109\/TMM.2023.3286981","volume":"27","author":"Q Zhang","year":"2023","unstructured":"Zhang Q, Hou J, Qian Y. Pointmcd: boosting deep point cloud encoders via multi-view cross-modal distillation for 3d shape recognition. IEEE Trans Multim. 2023;27:754\u201367.","journal-title":"IEEE Trans Multim"},{"key":"299_CR32","doi-asserted-by":"crossref","unstructured":"Su H, Maji S, Kalogerakis E, Erik G. Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. In: ICCV, 2015.","DOI":"10.1109\/ICCV.2015.114"},{"key":"299_CR33","unstructured":"Li B, Zhang T, Xia T. Vehicle detection from 3d lidar using fully convolutional network. In: RSS, 2016."},{"key":"299_CR34","doi-asserted-by":"crossref","unstructured":"Tatarchenko M, Park J, Koltun V, Zhou Q. Tangent convolutions for dense prediction in 3d. In: CVPR, 2018.","DOI":"10.1109\/CVPR.2018.00409"},{"key":"299_CR35","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. In: NIPS, 2017."},{"key":"299_CR36","doi-asserted-by":"crossref","unstructured":"Hu H, Zhang Z, Xie Z, Lin S. Local relation networks for image recognition. In: ICCV, 2019.","DOI":"10.1109\/ICCV.2019.00356"},{"key":"299_CR37","doi-asserted-by":"crossref","unstructured":"Zhao H, Jia J, Koltun V. Exploring self-attention for image recognition. In: CVPR, 2020.","DOI":"10.1109\/CVPR42600.2020.01009"},{"issue":"2","key":"299_CR38","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1007\/s41095-021-0229-5","volume":"7","author":"MH Guo","year":"2021","unstructured":"Guo MH, Cai JX, Liu ZN, Mu TJ, Martin RR, Hu S. Pct: point cloud transformer. Comput Visual Media. 2021;7(2):187\u201399.","journal-title":"Comput Visual Media"},{"key":"299_CR39","first-page":"33330","volume-title":"Proceedings of the annual conference on neural information processing systems, 35","author":"X Wu","year":"2022","unstructured":"Wu X, Lao Y, Jiang L, Liu X, Zhao H. Point transformer V2: grouped vector attention and partition-based pooling. In: Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, editors. Proceedings of the annual conference on neural information processing systems, 35. New York: Curran Associates Inc; 2022. p. 33330\u201342."},{"key":"299_CR40","doi-asserted-by":"crossref","unstructured":"Wu X, Jiang L, Wang PS, Liu Z, Liu X, Qiao Y, Ouyang W, He T, Zhao H. Point transformer v3: Simpler, faster, stronger. arXiv preprint arXiv:2312.10035, 2023.","DOI":"10.1109\/CVPR52733.2024.00463"},{"key":"299_CR41","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N, An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations, ICLR, 2021."},{"key":"299_CR42","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2301.09077","author":"Y Zhang","year":"2023","unstructured":"Zhang Y, Zhang Q, Hou J, Yuan Y, Xing G. Unleash the potential of image branch for cross-modal 3d object detection. Adv Neural Inf Process Syst. 2023. https:\/\/doi.org\/10.48550\/arXiv.2301.09077.","journal-title":"Adv Neural Inf Process Syst"},{"key":"299_CR43","doi-asserted-by":"crossref","unstructured":"Wu H, Wen C, Shi S, Li X, Wang C, Virtual sparse convolution for multimodal 3D object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), Vancouver, Canada, 2023, pp. 21653\u201321662.","DOI":"10.1109\/CVPR52729.2023.02074"},{"issue":"12","key":"299_CR44","doi-asserted-by":"publisher","first-page":"3332","DOI":"10.1007\/s11263-023-01869-9","volume":"131","author":"Y Zhang","year":"2023","unstructured":"Zhang Y, Zhang Q, Zhu Z, Hou J, Yuan Y. Glenet: boosting 3d object detectors with generative label uncertainty estimation. Int J Comput Vision. 2023;131(12):3332\u201352.","journal-title":"Int J Comput Vision"},{"key":"299_CR45","doi-asserted-by":"crossref","unstructured":"Lin Y, Yan Z, Huang H, Du D, Liu L, Cui S, Han X. Fpconv: Learning local flattening for point convolution. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, 2020, p. 4293\u20134302.","DOI":"10.1109\/CVPR42600.2020.00435"},{"key":"299_CR46","doi-asserted-by":"crossref","unstructured":"Zhang Q, Hou J, Qian Y, Zeng Y, Zhang J, He Y. Flattening-net: deep regular 2d representation for 3d point cloud analysis. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.","DOI":"10.1109\/TPAMI.2023.3244828"},{"key":"299_CR47","unstructured":"Zhang Q, Hou J, Wang W, He Y. Flatten anything: unsupervised neural surface parameterization. arXiv preprint arXiv:2405.14633, 2024."},{"key":"299_CR48","doi-asserted-by":"crossref","unstructured":"Wang C, Wu M, Lam S-K, Ning X, Yu S, Wang R, Li W, Srikanthan T. 2025. Gpsformer: a global perception and local structure fitting-based transformer for point cloud understanding. In: European conference on computer vision, p. 75\u201392. Springer.","DOI":"10.1007\/978-3-031-73242-3_5"},{"key":"299_CR49","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3328712","author":"C Wang","year":"2023","unstructured":"Wang C, Ning X, Li W, Bai X, Gao X. 3D person re-identification based on global semantic guidance and local feature aggregation. IEEE Trans Circ Syst Video Technol. 2023. https:\/\/doi.org\/10.1109\/TCSVT.2023.3328712.","journal-title":"IEEE Trans Circ Syst Video Technol"},{"key":"299_CR50","doi-asserted-by":"publisher","first-page":"113125","DOI":"10.1016\/j.knosys.2025.113125","volume":"311","author":"C Wang","year":"2025","unstructured":"Wang C, Cao R, Wang R. Learning discriminative topological structure information representation for 2D shape and social network classification via persistent homology. Knowl Based Syst. 2025;311:113125.","journal-title":"Knowl Based Syst"},{"key":"299_CR51","doi-asserted-by":"crossref","unstructured":"Jiang L, Wang C, Ning X, Yu Z. LTTPoint: A MLP-based point cloud classification method with local topology transformation module. In: 2023 7th Asian conference on artificial intelligence technology (ACAIT), 2023. p. 783\u2013789. IEEE.","DOI":"10.1109\/ACAIT60137.2023.10528609"},{"key":"299_CR52","first-page":"1","volume":"60","author":"C Wang","year":"2022","unstructured":"Wang C, Ning X, Sun L, Zhang L, Li W, Bai X. Learning discriminative features by covering local geometric space for point cloud analysis. IEEE Trans Geosci Remote Sens. 2022;60:1\u201315.","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"299_CR53","doi-asserted-by":"crossref","unstructured":"Sun P, Kretzschmar H, Dotiwalla X, Chouard A, Patnaik V, Tsui P, Guo J, ZhouY, Chai Y, Caine B, et al. Scalability in perception for autonomous driving: Waymo open dataset. In: CVPR, p. 2446\u20132454, 2020.","DOI":"10.1109\/CVPR42600.2020.00252"},{"key":"299_CR54","doi-asserted-by":"crossref","unstructured":"Yin T, Zhou X, Krahenbuhl P. Center-based 3d object detection and tracking. In: CVPR, 2021.","DOI":"10.1109\/CVPR46437.2021.01161"}],"container-title":["Discover Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44163-025-00299-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44163-025-00299-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44163-025-00299-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T14:33:56Z","timestamp":1749479636000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44163-025-00299-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,9]]},"references-count":54,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["299"],"URL":"https:\/\/doi.org\/10.1007\/s44163-025-00299-5","relation":{},"ISSN":["2731-0809"],"issn-type":[{"value":"2731-0809","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,9]]},"assertion":[{"value":"4 February 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 May 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 June 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"93"}}