{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T11:42:03Z","timestamp":1781264523116,"version":"3.54.1"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T00:00:00Z","timestamp":1702598400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T00:00:00Z","timestamp":1702598400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61801414"],"award-info":[{"award-number":["61801414"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072391"],"award-info":[{"award-number":["62072391"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>3D object detection is a critical task in the fields of virtual reality and autonomous driving. Given that each sensor has its own strengths and limitations, multi-sensor-based 3D object detection has gained popularity. However, most existing methods extract high-level image semantic features and fuse them with point cloud features, focusing solely on consistent information from both sensors while ignoring their complementary information. In this paper, we present a novel two-stage multi-sensor deep neural network, called the adaptive learning point cloud and image diversity feature fusion network (APIDFF-Net), for 3D object detection. Our approach employs the fine-grained image information to complement the point cloud information by combining low-level image features with high-level point cloud features. Specifically, we design a shallow image feature extraction module to learn fine-grained information from images, instead of relying on deep layer features with coarse-grained information. Furthermore, we design a diversity feature fusion (DFF) module that transforms low-level image features into point-wise image features and explores their complementary features through an attention mechanism, ensuring an effective combination of fine-grained image features and point cloud features. Experiments on the KITTI benchmark show that the proposed method outperforms state-of-the-art methods.<\/jats:p>","DOI":"10.1007\/s40747-023-01295-x","type":"journal-article","created":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T19:02:22Z","timestamp":1702666942000},"page":"2825-2837","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Adaptive learning point cloud and image diversity feature fusion network for 3D object detection"],"prefix":"10.1007","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7869-2404","authenticated-orcid":false,"given":"Weiqing","family":"Yan","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shile","family":"Liu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hao","family":"Liu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guanghui","family":"Yue","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xuan","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yongchao","family":"Song","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jindong","family":"Xu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2023,12,15]]},"reference":[{"key":"1295_CR1","doi-asserted-by":"crossref","unstructured":"Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1907\u20131915","DOI":"10.1109\/CVPR.2017.691"},{"key":"1295_CR2","doi-asserted-by":"crossref","unstructured":"Chen Y, Liu S, Shen X, Jia J (2020) Dsgn: Deep stereo geometry network for 3d object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 12536\u201312545","DOI":"10.1109\/CVPR42600.2020.01255"},{"key":"1295_CR3","doi-asserted-by":"crossref","unstructured":"Chen YN, Dai H, Ding Y (2022) Pseudo-stereo for monocular 3d object detection in autonomous driving. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 887\u2013897","DOI":"10.1109\/CVPR52688.2022.00096"},{"key":"1295_CR4","doi-asserted-by":"crossref","unstructured":"Chen Z, Li Z, Zhang S, Fang L, Jiang Q, Zhao F (2022) Autoalignv2: Deformable feature aggregation for dynamic multi-modal 3d object detection. arXiv preprint arXiv:2207.10316","DOI":"10.24963\/ijcai.2022\/116"},{"key":"1295_CR5","doi-asserted-by":"crossref","unstructured":"Chen Z, Li Z, Zhang S, Fang L, Jiang Q, Zhao F, Zhou B, Zhao H (2022) Autoalign: Pixel-instance feature aggregation for multi-modal 3d object detection. arXiv preprint arXiv:2201.06493","DOI":"10.24963\/ijcai.2022\/116"},{"key":"1295_CR6","doi-asserted-by":"crossref","unstructured":"Deng J, Shi S, Li P, Zhou W, Zhang Y, Li H (2021) Voxel r-cnn: Towards high performance voxel-based 3d object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.\u00a035, pp. 1201\u20131209","DOI":"10.1609\/aaai.v35i2.16207"},{"key":"1295_CR7","doi-asserted-by":"crossref","unstructured":"Du L, Ye X, Tan X, Feng J, Xu Z, Ding E, Wen S (2020) Associate-3ddet: Perceptual-to-conceptual association for 3d point cloud object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 13329\u201313338","DOI":"10.1109\/CVPR42600.2020.01334"},{"key":"1295_CR8","doi-asserted-by":"crossref","unstructured":"Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp. 3354\u20133361","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"1295_CR9","unstructured":"Guanghui Y, Xiao H, Xie H, Zhou T, Zhou W, Yan W, Zhao B, Wang T, Jiang Q (2023) Dual-constraint coarse-to-fine network for camouflaged object detection. IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"1295_CR10","doi-asserted-by":"crossref","unstructured":"He C, Zeng H, Huang J, Hua XS, Zhang L (2020) Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 11873\u201311882","DOI":"10.1109\/CVPR42600.2020.01189"},{"key":"1295_CR11","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"1295_CR12","doi-asserted-by":"crossref","unstructured":"Huang T, Liu Z, Chen X, Bai X (2020) Epnet: Enhancing point features with image semantics for 3d object detection. In: European Conference on Computer Vision, pp. 35\u201352. Springer","DOI":"10.1007\/978-3-030-58555-6_3"},{"key":"1295_CR13","doi-asserted-by":"crossref","unstructured":"Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1\u20138. IEEE","DOI":"10.1109\/IROS.2018.8594049"},{"key":"1295_CR14","doi-asserted-by":"crossref","unstructured":"Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 12697\u201312705","DOI":"10.1109\/CVPR.2019.01298"},{"key":"1295_CR15","doi-asserted-by":"crossref","unstructured":"Li P, Chen X, Shen S (2019) Stereo r-cnn based 3d object detection for autonomous driving. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 7644\u20137652","DOI":"10.1109\/CVPR.2019.00783"},{"key":"1295_CR16","unstructured":"Liang Z, Zhang M, Zhang Z, Zhao X, Pu S (2020) Rangercnn: Towards fast and accurate 3d object detection with range image representation. arXiv preprint arXiv:2009.00206"},{"key":"1295_CR17","doi-asserted-by":"crossref","unstructured":"Lin TY, Goyal P, Girshick R, He K, Doll\u00e1r P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980\u20132988","DOI":"10.1109\/ICCV.2017.324"},{"key":"1295_CR18","doi-asserted-by":"crossref","unstructured":"Liu X, Xue N, Wu T (2022) Learning auxiliary monocular contexts helps monocular 3d object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol.\u00a036, pp. 1810\u20131818","DOI":"10.1609\/aaai.v36i2.20074"},{"key":"1295_CR19","doi-asserted-by":"crossref","unstructured":"Liu Z, Zhao X, Huang T, Hu R, Zhou Y, Bai X (2020) Tanet: Robust 3d object detection from point clouds with triple attention. In: Proceedings of the AAAI conference on artificial intelligence, vol.\u00a034, pp. 11677\u201311684","DOI":"10.1609\/aaai.v34i07.6837"},{"key":"1295_CR20","doi-asserted-by":"crossref","unstructured":"Lu Y, Ma X, Yang L, Zhang T, Liu Y, Chu Q, Yan J, Ouyang W (2021) Geometry uncertainty projection network for monocular 3d object detection. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 3111\u20133121","DOI":"10.1109\/ICCV48922.2021.00310"},{"key":"1295_CR21","doi-asserted-by":"crossref","unstructured":"Noh J, Lee S, Ham B (2021) Hvpr: Hybrid voxel-point representation for single-stage 3d object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 14605\u201314614","DOI":"10.1109\/CVPR46437.2021.01437"},{"key":"1295_CR22","doi-asserted-by":"crossref","unstructured":"Paigwar A, Sierra-Gonzalez D, Erkent \u00d6, Laugier C (2021) Frustum-pointpillars: A multi-stage approach for 3d object detection using rgb camera and lidar. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 2926\u20132933","DOI":"10.1109\/ICCVW54120.2021.00327"},{"key":"1295_CR23","doi-asserted-by":"crossref","unstructured":"Pang S, Morris D, Radha H (2020) Clocs: Camera-lidar object candidates fusion for 3d object detection. In: 2020 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10386\u201310393. IEEE","DOI":"10.1109\/IROS45743.2020.9341791"},{"key":"1295_CR24","doi-asserted-by":"crossref","unstructured":"Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 918\u2013927","DOI":"10.1109\/CVPR.2018.00102"},{"key":"1295_CR25","unstructured":"Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 652\u2013660"},{"key":"1295_CR26","unstructured":"Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems 30"},{"key":"1295_CR27","doi-asserted-by":"crossref","unstructured":"Reading C, Harakeh A, Chae J, Waslander SL (2021) Categorical depth distribution network for monocular 3d object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 8555\u20138564","DOI":"10.1109\/CVPR46437.2021.00845"},{"key":"1295_CR28","doi-asserted-by":"crossref","unstructured":"Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 10529\u201310538","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"1295_CR29","doi-asserted-by":"crossref","unstructured":"Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 770\u2013779","DOI":"10.1109\/CVPR.2019.00086"},{"issue":"8","key":"1295_CR30","first-page":"2647","volume":"43","author":"S Shi","year":"2020","unstructured":"Shi S, Wang Z, Shi J, Wang X, Li H (2020) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans Pattern Anal Mach Intell 43(8):2647\u20132664","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1295_CR31","doi-asserted-by":"crossref","unstructured":"Simonelli A, Bulo SR, Porzi L, L\u00f3pez-Antequera M, Kontschieder P (2019) Disentangling monocular 3d object detection. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 1991\u20131999","DOI":"10.1109\/ICCV.2019.00208"},{"key":"1295_CR32","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556"},{"key":"1295_CR33","doi-asserted-by":"crossref","unstructured":"Vora S, Lang AH, Helou B, Beijbom O (2020) Pointpainting: Sequential fusion for 3d object detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 4604\u20134612","DOI":"10.1109\/CVPR42600.2020.00466"},{"key":"1295_CR34","doi-asserted-by":"crossref","unstructured":"Wang Y, Chao WL, Garg D, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 8445\u20138453","DOI":"10.1109\/CVPR.2019.00864"},{"key":"1295_CR35","doi-asserted-by":"crossref","unstructured":"Wang Z, Jia K (2019) Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. In: 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1742\u20131749. IEEE","DOI":"10.1109\/IROS40897.2019.8968513"},{"key":"1295_CR36","doi-asserted-by":"crossref","unstructured":"Wang Z, Jia K (2019) Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. In: 2019 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1742\u20131749. IEEE","DOI":"10.1109\/IROS40897.2019.8968513"},{"key":"1295_CR37","doi-asserted-by":"crossref","unstructured":"Xie L, Xiang C, Yu Z, Xu G, Yang Z, Cai D, He X (2020) Pi-rcnn: An efficient multi-sensor 3d object detector with point-based attentive cont-conv fusion module. In: Proceedings of the AAAI conference on artificial intelligence, vol.\u00a034, pp. 12460\u201312467","DOI":"10.1609\/aaai.v34i07.6933"},{"key":"1295_CR38","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.101832","volume":"98","author":"W Yan","year":"2023","unstructured":"Yan W, Gu M, Ren J, Yue G, Liu Z, Xu J, Lin W (2023) Collaborative structure and feature learning for multi-view clustering. Information Fusion 98:101832","journal-title":"Information Fusion"},{"issue":"10","key":"1295_CR39","doi-asserted-by":"publisher","first-page":"3337","DOI":"10.3390\/s18103337","volume":"18","author":"Y Yan","year":"2018","unstructured":"Yan Y, Mao Y, Li B (2018) Second: Sparsely embedded convolutional detection. Sensors 18(10):3337","journal-title":"Sensors"},{"key":"1295_CR40","doi-asserted-by":"crossref","unstructured":"Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) Std: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp. 1951\u20131960","DOI":"10.1109\/ICCV.2019.00204"},{"key":"1295_CR41","doi-asserted-by":"crossref","unstructured":"Yin T, Zhou X, Krahenbuhl P (2021) Center-based 3d object detection and tracking. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 11784\u201311793","DOI":"10.1109\/CVPR46437.2021.01161"},{"key":"1295_CR42","doi-asserted-by":"crossref","unstructured":"Yoo JH, Kim Y, Kim J, Choi JW (2020) 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. In: European Conference on Computer Vision, pp. 720\u2013736. Springer","DOI":"10.1007\/978-3-030-58583-9_43"},{"key":"1295_CR43","doi-asserted-by":"crossref","unstructured":"Zhang Y, Hu Q, Xu G, Ma Y, Wan J, Guo Y (2022) Not all points are equal: Learning highly efficient point-based detectors for 3d lidar point clouds. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 18953\u201318962","DOI":"10.1109\/CVPR52688.2022.01838"},{"key":"1295_CR44","unstructured":"Zhang Z, Zhang M, Liang Z, Zhao X, Yang M, Tan W, Pu S (2020) Maff-net: Filter false positive for 3d vehicle detection with multi-modal adaptive feature fusion. arXiv e-prints pp. arXiv\u20132009"},{"key":"1295_CR45","doi-asserted-by":"crossref","unstructured":"Zhao K, Ma L, Meng Y, Liu L, Wang J, Junior JM, Gon\u00e7alves WN, Li J (2022) 3d vehicle detection using multi-level fusion from point clouds and images. IEEE Transactions on Intelligent Transportation Systems","DOI":"10.1109\/TITS.2021.3137392"},{"key":"1295_CR46","doi-asserted-by":"crossref","unstructured":"Zheng W, Tang W, Jiang L, Fu CW (2021) Se-ssd: Self-ensembling single-stage object detector from point cloud. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 14494\u201314503","DOI":"10.1109\/CVPR46437.2021.01426"},{"key":"1295_CR47","doi-asserted-by":"publisher","first-page":"1329","DOI":"10.1109\/TIP.2023.3242775","volume":"32","author":"W Zhou","year":"2023","unstructured":"Zhou W, Zhu Y, Lei J, Yang R, Yu L (2023) Lsnet: Lightweight spatial boosting network for detecting salient objects in rgb-thermal images. IEEE Trans Image Process 32:1329\u20131340","journal-title":"IEEE Trans Image Process"},{"key":"1295_CR48","doi-asserted-by":"crossref","unstructured":"Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4490\u20134499","DOI":"10.1109\/CVPR.2018.00472"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01295-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01295-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01295-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,30]],"date-time":"2024-03-30T15:35:29Z","timestamp":1711812929000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01295-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,15]]},"references-count":48,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["1295"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01295-x","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,15]]},"assertion":[{"value":"20 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 November 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 December 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}