{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T10:04:30Z","timestamp":1764842670883,"version":"build-2065373602"},"reference-count":50,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2022,10,2]],"date-time":"2022-10-02T00:00:00Z","timestamp":1664668800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the Youth Innovation Promotion Association, CAS"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects and not moving objects. In this paper, we proposed a non-local-based multi-scale feature fusion method, which can handle both moving and static objects without GPS- and IMU-based registrations. Considering that non-local methods are resource-consuming, we proposed a novel simplified non-local block based on the sparsity of the point cloud. By filtering out empty units, memory consumption decreased by 99.93%. In addition, triple attention is adopted to enhance the key information on the object and suppresses background noise, further benefiting non-local-based feature fusion methods. Finally, we verify the method based on PointPillars and CenterPoint. Experimental results show that the mAP of the proposed method improved by 3.9% and 4.1% in mAP compared with concatenation-based fusion modules, PointPillars-2 and CenterPoint-2, respectively. In addition, the proposed network outperforms powerful 3D-VID by 1.2% in mAP.<\/jats:p>","DOI":"10.3390\/s22197473","type":"journal-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T03:07:28Z","timestamp":1665371248000},"page":"7473","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2998-4333","authenticated-orcid":false,"given":"Zhenyu","family":"Zhai","sequence":"first","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1266-0324","authenticated-orcid":false,"given":"Qiantong","family":"Wang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5041-3300","authenticated-orcid":false,"given":"Zongxu","family":"Pan","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2926-8620","authenticated-orcid":false,"given":"Zhentong","family":"Gao","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenlong","family":"Hu","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Technology in Geo-Spatial Information Processing and Application System, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,10,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Caesar, H., Bankiti, V., Lang, A.H., Vora, S., Liong, V.E., Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Beijbom, O. (2020, January 14\u201319). Nuscenes: A multimodal dataset for autonomous driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision meets robotics: The kitti dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Huang, R., Zhang, W., Kundu, A., Pantofaru, C., Ross, D.A., Funkhouser, T., and Fathi, A. (2020, January 23\u201328). An lstm approach to temporal 3d object detection in lidar point clouds. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58523-5_16"},{"key":"ref_4","unstructured":"El Sallab, A., Sobh, I., Zidan, M., Zahran, M., and Abdelkarim, S. (2018, January 5\u201310). YOLO4D: A Spatio-temporal Approach for Real-time Multi-object Detection and Classification from LiDAR Point Clouds. Proceedings of the Neural Information Processing Systems (NIPS), Machine Learning in Inteligent Transportation MLITS Workshop, Montreal, QC, Canada."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Luo, W., Yang, B., and Urtasun, R. (2018, January 18\u201323). Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00376"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Zhou, Y., Najibi, M., Sun, P., Vo, K., Deng, B., and Anguelov, D. (2021, January 20\u201325). Offboard 3d object detection from point cloud sequences. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00607"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sun, J., Xie, Y., Zhang, S., Chen, L., Zhang, G., Bao, H., and Zhou, X. (2021, January 10\u201317). You Don\u2019t Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00317"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Yin, J., Shen, J., Guan, C., Zhou, D., and Yang, R. (2020, January 14\u201319). Lidar-based online 3d video object detection with graph-based message passing and spatiotemporal transformer attention. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01151"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017, January 22\u201329). Deformable convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_10","unstructured":"Zhang, Y., Ye, Y., Xiang, Z., and Gu, J. (December, January 30). SDP-Net: Scene Flow Based Real-Time Object Detection and Prediction from Sequential 3D Point Clouds. Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Liu, Z., Zhao, X., Huang, T., Hu, R., Zhou, Y., and Bai, X. (2020, January 7\u201312). Tanet: Robust 3d object detection from point clouds with triple attention. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6837"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 15\u201320). Pointpillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yin, T., Zhou, X., and Krahenbuhl, P. (2021, January 20\u201325). Center-based 3d object detection and tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01161"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Chen, Y., Tai, L., Sun, K., and Li, M. (2020, January 14\u201319). MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01211"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., and Fan, X. (November, January 27). Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00695"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., and Weinberger, K.Q. (2019, January 15\u201320). Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00864"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Chen, Y., Liu, S., Shen, X., and Jia, J. (2020, January 14\u201319). DSGN: Deep Stereo Geometry Network for 3D Object Detection. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01255"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 15\u201320). Pointrcnn: 3D object proposal generation and detection from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201323). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_22","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_23","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Hu, P., Ziglar, J., Held, D., and Ramanan, D. (2020, January 14\u201319). What you see is what you get: Exploiting visibility for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01101"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3D Object Detection Network for Autonomous Driving. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S.L. (2018, January 1\u20135). Joint 3D Proposal Generation and Object Detection from View Aggregation. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594049"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201323). Frustum PointNets for 3D Object Detection from RGB-D Data. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Yoo, J.H., Kim, Y., Kim, J., and Choi, J.W. (2020, January 23\u201328). 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58583-9_43"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 14\u201319). Pointpainting: Sequential fusion for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00466"},{"key":"ref_30","unstructured":"Qi, C.R., Litany, O., He, K., and Guibas, L.J. (November, January 27). Deep hough voting for 3d object detection in point clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhai, Z., Wang, Q., Pan, Z., Hu, W., and Hu, Y. (2022, January 17\u201320). 3D Object Detection Based on Feature Fusion of Point Cloud Sequences. Proceedings of the 2022 IEEE 17th Conference on Industrial Electronics and Applications (ICIEA), Chengdu, China.","DOI":"10.1109\/ICIEA54703.2022.10006093"},{"key":"ref_32","unstructured":"Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., and Dahl, G.E. (2017, January 6\u201311). Neural message passing for quantum chemistry. Proceedings of the International conference on machine learning. PMLR, Sydney, NSW, Australia."},{"key":"ref_33","unstructured":"Ballas, N., Yao, L., Pal, C., and Courville, A. (2015, January 7\u20139). Delving Deeper into Convolutional Networks for Learning Video Representations. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_34","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (November, January 27). Ccnet: Criss-cross attention for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_35","unstructured":"Hu, H., Zhang, Z., Xie, Z., and Lin, S. (November, January 27). Local relation networks for image recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_36","unstructured":"Zhu, Z., Xu, M., Bai, S., Huang, T., and Bai, X. (November, January 27). Asymmetric non-local neural networks for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_37","unstructured":"Huang, L., Yuan, Y., Guo, J., Zhang, C., Chen, X., and Wang, J. (2019). Interlaced sparse self-attention for semantic segmentation. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhang, L., Xu, D., Arnab, A., and Torr, P.H. (2020, January 14\u201319). Dynamic graph message passing networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00378"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_40","first-page":"2011","article-title":"Squeeze-and-Excitation Networks","volume":"Volume 42","author":"Jie","year":"2017","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"ref_41","unstructured":"Team, O.D. (2021, November 10). OpenPCDet: An Open-Source Toolbox for 3D Object Detection from Point Clouds. Available online: https:\/\/github.com\/open-mmlab\/OpenPCDet."},{"key":"ref_42","unstructured":"Zhu, B., Jiang, Z., Zhou, X., Li, Z., and Yu, G. (2019). Class-balanced grouping and sampling for point cloud 3d object detection. arXiv Preprint."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1219","DOI":"10.1109\/TPAMI.2020.3025077","article-title":"Disentangling monocular 3d object detection: From single to multi-class recognition","volume":"Volume 44","author":"Simonelli","year":"2020","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.neucom.2019.09.086","article-title":"Sarpnet: Shape attention regional proposal network for lidar-based 3d object detection","volume":"379","author":"Ye","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"6637","DOI":"10.1007\/s00521-021-06061-z","article-title":"GVnet: Gaussian model with voxel-based 3D detection network for autonomous driving","volume":"34","author":"Qin","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Wang, J., Lan, S., Gao, M., and Davis, L.S. (2020, January 23\u201328). Infofocus: 3d object detection for autonomous driving with dynamic information modeling. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58607-2_24"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"160299","DOI":"10.1109\/ACCESS.2021.3131389","article-title":"BirdNet+: Two-Stage 3D Object Detection in LiDAR Through a Sparsity-Invariant Bird\u2019s Eye View","volume":"9","author":"Barrera","year":"2021","journal-title":"IEEE Access"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., and Jia, J. (2020, January 14\u201319). 3dssd: Point-based 3d single stage object detector. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01105"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Du, L., Ye, X., Tan, X., Johns, E., Chen, B., Ding, E., Xue, X., and Feng, J. (2021). Ago-net: Association-guided 3d point cloud object detection network. IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE.","DOI":"10.1109\/TPAMI.2021.3104172"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Zhu, X., Ma, Y., Wang, T., Xu, Y., Shi, J., and Lin, D. (2020, January 23\u201328). Ssn: Shape signature networks for multi-class object detection from point clouds. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58595-2_35"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/19\/7473\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:45:30Z","timestamp":1760143530000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/19\/7473"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,2]]},"references-count":50,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["s22197473"],"URL":"https:\/\/doi.org\/10.3390\/s22197473","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,10,2]]}}}