{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T11:13:31Z","timestamp":1778325211495,"version":"3.51.4"},"reference-count":35,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2020,12,28]],"date-time":"2020-12-28T00:00:00Z","timestamp":1609113600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2018YFB1201602-05"],"award-info":[{"award-number":["2018YFB1201602-05"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Natural Science Foundation of China Enterprise Innovation and Development Joint Fund","award":["U19B2004"],"award-info":[{"award-number":["U19B2004"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>3D object detection in LiDAR point clouds has been extensively used in autonomous driving, intelligent robotics, and augmented reality. Although the one-stage 3D detector has satisfactory training and inference speed, there are still some performance problems due to insufficient utilization of bird\u2019s eye view (BEV) information. In this paper, a new backbone network is proposed to complete the cross-layer fusion of multi-scale BEV feature maps, which makes full use of various information for detection. Specifically, our proposed backbone network can be divided into a coarse branch and a fine branch. In the coarse branch, we use the pyramidal feature hierarchy (PFH) to generate multi-scale BEV feature maps, which retain the advantages of different levels and serves as the input of the fine branch. In the fine branch, our proposed pyramid splitting and aggregation (PSA) module deeply integrates different levels of multi-scale feature maps, thereby improving the expressive ability of the final features. Extensive experiments on the challenging KITTI-3D benchmark show that our method has better performance in both 3D and BEV object detection compared with some previous state-of-the-art methods. Experimental results with average precision (AP) prove the effectiveness of our network.<\/jats:p>","DOI":"10.3390\/s21010136","type":"journal-article","created":{"date-parts":[[2020,12,28]],"date-time":"2020-12-28T10:33:56Z","timestamp":1609151636000},"page":"136","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["PSANet: Pyramid Splitting and Aggregation Network for 3D Object Detection in Point Cloud"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4908-1150","authenticated-orcid":false,"given":"Fangyu","family":"Li","sequence":"first","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9993-8821","authenticated-orcid":false,"given":"Weizheng","family":"Jin","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4973-6444","authenticated-orcid":false,"given":"Cien","family":"Fan","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lian","family":"Zou","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4441-3769","authenticated-orcid":false,"given":"Qingsheng","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaopeng","family":"Li","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hao","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Electronic Information, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yifeng","family":"Liu","sequence":"additional","affiliation":[{"name":"National Engineering Laboratory for Risk Perception and Prevention (NEL-RPP), Beijing 100041, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 11\u201318). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_3","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_5","unstructured":"Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I. (2017, January 21\u201326). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Dong, C., Loy, C.C., He, K., and Tang, X. (2014, January 6\u201312). Learning a deep convolutional network for image super-resolution. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10593-2_13"},{"key":"ref_8","unstructured":"Kim, J., Kwon Lee, J., and Mu Lee, K. (July, January 26). Accurate image super-resolution using very deep convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lim, B., Son, S., Kim, H., Nah, S., and Mu Lee, K. (2017, January 21\u201326). Enhanced deep residual networks for single image super-resolution. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.151"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/0600000079","article-title":"Computer vision for autonomous vehicles: Problems, datasets and state of the art","volume":"12","author":"Janai","year":"2020","journal-title":"Found. Trends Comput. Graph. Vis."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201322). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 16\u201320). Pointpillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Liu, Z., Zhao, X., Huang, T., Hu, R., Zhou, Y., and Bai, X. (2020, January 7\u201312). TANet: Robust 3D Object Detection from Point Clouds with Triple Attention. Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6837"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, D., Zhang, H., Tang, J., Wang, M., Hua, X., and Sun, Q. (2020). Feature Pyramid Transformer. arXiv.","DOI":"10.1007\/978-3-030-58604-1_20"},{"key":"ref_16","unstructured":"Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., and Urtasun, R. (July, January 26). Monocular 3d object detection for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Li, B., Ouyang, W., Sheng, L., Zeng, X., and Wang, X. (2019, January 16\u201320). Gs3d: An efficient 3d object detection framework for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00111"},{"key":"ref_18","unstructured":"Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., and Fan, X. (November, January 27). Accurate monocular 3d object detection via color-embedded 3d reconstruction for autonomous driving. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_19","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S.L. (2018, January 1\u20135). Joint 3d proposal generation and object detection from view aggregation. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594049"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201322). Frustum pointnets for 3d object detection from rgb-d data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liang, M., Yang, B., Wang, S., and Urtasun, R. (2018, January 8\u201314). Deep continuous fusion for multi-sensor 3d object detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01270-0_39"},{"key":"ref_24","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017, January 4\u20139). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 16\u201320). Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yang, B., Luo, W., and Urtasun, R. (2018, January 18\u201323). Pixor: Real-time 3d object detection from point clouds. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00798"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? the kitti vision benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Yang, Z., Sun, Y., Liu, S., Shen, X., and Jia, J. (2018). Ipod: Intensive point-based object detector for point cloud. arXiv.","DOI":"10.1109\/ICCV.2019.00204"},{"key":"ref_31","unstructured":"Li, X., Guivant, J., Kwok, N., Xu, Y., Li, R., and Wu, H. (2019). Three-dimensional Backbone Network for 3D Object Detection in Traffic Scenes. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Kuang, H., Wang, B., An, J., Zhang, M., and Zhang, Z. (2020). Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds. Sensors, 20.","DOI":"10.3390\/s20030704"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Choi, J., Chun, D., Kim, H., and Lee, H.J. (2019, January 27\u201328). Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00059"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, Z., Shi, J., Wang, X., and Li, H. (2020). From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2020.2977026"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., and Li, H. (2020, January 13\u201319). Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01054"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/1\/136\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:46:55Z","timestamp":1760179615000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/1\/136"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,28]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["s21010136"],"URL":"https:\/\/doi.org\/10.3390\/s21010136","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,28]]}}}