{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T16:16:38Z","timestamp":1775578598491,"version":"3.50.1"},"reference-count":35,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2020,1,28]],"date-time":"2020-01-28T00:00:00Z","timestamp":1580169600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-Feature Pyramid Network, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts and fuses multi-scale voxel information in a bottom-up manner, whereas decoder fuses multiple feature maps from various scales by Feature Pyramid Network in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.<\/jats:p>","DOI":"10.3390\/s20030704","type":"journal-article","created":{"date-parts":[[2020,1,28]],"date-time":"2020-01-28T09:37:09Z","timestamp":1580204229000},"page":"704","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":183,"title":["Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds"],"prefix":"10.3390","volume":"20","author":[{"given":"Hongwu","family":"Kuang","sequence":"first","affiliation":[{"name":"Hangzhou Hikvision Digital Technology Co. Ltd., Hangzhou 310052, China"}]},{"given":"Bei","family":"Wang","sequence":"additional","affiliation":[{"name":"Hangzhou Hikvision Digital Technology Co. Ltd., Hangzhou 310052, China"}]},{"given":"Jianping","family":"An","sequence":"additional","affiliation":[{"name":"Hangzhou Hikvision Digital Technology Co. Ltd., Hangzhou 310052, China"}]},{"given":"Ming","family":"Zhang","sequence":"additional","affiliation":[{"name":"Hangzhou Hikvision Digital Technology Co. Ltd., Hangzhou 310052, China"}]},{"given":"Zehan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Hangzhou Hikvision Digital Technology Co. Ltd., Hangzhou 310052, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,1,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_2","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, NIPS."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_4","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_6","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S.L. (2018, January 1\u20135). Joint 3d proposal generation and object detection from view aggregation. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594049"},{"key":"ref_9","unstructured":"Song, S., and Xiao, J. (July, January 26). Deep sliding shapes for amodal 3d object detection in rgb-d images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201323). Frustum pointnets for 3d object detection from rgb-d data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Engelcke, M., Rao, D., Wang, D.Z., Tong, C.H., and Posner, I. (June, January 29). Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks. Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore.","DOI":"10.1109\/ICRA.2017.7989161"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 16\u201320). Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, B. (2017, January 24\u201328). 3d fully convolutional network for vehicle detection in point cloud. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8205955"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yu, S.L., Westfechtel, T., Hamada, R., Ohno, K., and Tadokoro, S. (2017, January 11\u201313). Vehicle detection and localization on bird\u2019s eye view elevation images using convolutional neural network. Proceedings of the 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), Shanghai, China.","DOI":"10.1109\/SSRR.2017.8088147"},{"key":"ref_16","first-page":"10","article-title":"Voting for Voting in Online Point Cloud Object Detection","volume":"Volume 1","author":"Wang","year":"2015","journal-title":"Robotics: Science and Systems"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Xiang, Y., Choi, W., Lin, Y., and Savarese, S. (2015, January 7\u201312). Data-driven 3d voxel patterns for object category recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298800"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Maturana, D., and Scherer, S. (October, January 28). Voxnet: A 3d convolutional neural network for real-time object recognition. Proceedings of the 2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany.","DOI":"10.1109\/IROS.2015.7353481"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 16\u201320). PointPillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Aubry, M., Schlickewei, U., and Cremers, D. (2011, January 6\u201313). The wave kernel signature: A quantum mechanical approach to shape analysis. Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain.","DOI":"10.1109\/ICCVW.2011.6130444"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bronstein, M.M., and Kokkinos, I. (2010, January 13\u201318). Scale-invariant heat kernel signatures for non-rigid shape recognition. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539838"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1383","DOI":"10.1111\/j.1467-8659.2009.01515.x","article-title":"A concise and provably informative multi-scale signature based on heat diffusion","volume":"28","author":"Sun","year":"2009","journal-title":"Comput. Graph. Forum"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1111\/1467-8659.00669","article-title":"On visual similarity based 3D model retrieval","volume":"22","author":"Chen","year":"2003","journal-title":"Comput. Graph. Forum"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1109\/34.765655","article-title":"Using spin images for efficient object recognition in cluttered 3D scenes","volume":"21","author":"Johnson","year":"1999","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1109\/TPAMI.2007.41","article-title":"Shape classification using the inner-distance","volume":"29","author":"Ling","year":"2007","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Rusu, R.B., Blodow, N., and Beetz, M. (2009, January 12\u201317). Fast point feature histograms (FPFH) for 3D registration. Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan.","DOI":"10.1109\/ROBOT.2009.5152473"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Rusu, R.B., Blodow, N., Marton, Z.C., and Beetz, M. (2008, January 22\u201326). Aligning point cloud views using persistent feature histograms. Proceedings of the 2008 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Nice, France.","DOI":"10.1109\/IROS.2008.4650967"},{"key":"ref_29","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_30","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in Neural Information Processing Systems, NIPS."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201323). Voxelnet: End-to-end learning for point cloud based 3d object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Simony, M., Milzy, S., Amendey, K., and Gross, H.M. (2018, January 8\u201314). Complex-YOLO: An Euler-region-proposal for real-time 3D object detection on point clouds. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-11009-3_11"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yang, B., Luo, W., and Urtasun, R. (2018, January 18\u201323). Pixor: Real-time 3d object detection from point clouds. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00798"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The kitti vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_35","unstructured":"Chen, X., Kundu, K., Zhu, Y., Berneshawi, A.G., Ma, H., Fidler, S., and Urtasun, R. (2015). 3d object proposals for accurate object class detection. Advances in Neural Information Processing Systems, NIPS."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/3\/704\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T13:20:45Z","timestamp":1760361645000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/3\/704"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,28]]},"references-count":35,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2020,2]]}},"alternative-id":["s20030704"],"URL":"https:\/\/doi.org\/10.3390\/s20030704","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,28]]}}}