{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T13:55:55Z","timestamp":1772114155985,"version":"3.50.1"},"reference-count":40,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T00:00:00Z","timestamp":1647561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Achieving the accurate perception of occluded objects for autonomous vehicles is a challenging problem. Human vision can always quickly locate important object regions in complex external scenes, while other regions are only roughly analysed or ignored, defined as the visual attention mechanism. However, the perception system of autonomous vehicles cannot know which part of the point cloud is in the region of interest. Therefore, it is meaningful to explore how to use the visual attention mechanism in the perception system of autonomous driving. In this paper, we propose the model of the spatial attention frustum to solve object occlusion in 3D object detection. The spatial attention frustum can suppress unimportant features and allocate limited neural computing resources to critical parts of the scene, thereby providing greater relevance and easier processing for higher-level perceptual reasoning tasks. To ensure that our method maintains good reasoning ability when faced with occluded objects with only a partial structure, we propose a local feature aggregation module to capture more complex local features of the point cloud. Finally, we discuss the projection constraint relationship between the 3D bounding box and the 2D bounding box and propose a joint anchor box projection loss function, which will help to improve the overall performance of our method. The results of the KITTI dataset show that our proposed method can effectively improve the detection accuracy of occluded objects. Our method achieves 89.46%, 79.91% and 75.53% detection accuracy in the easy, moderate, and hard difficulty levels of the car category, and achieves a 6.97% performance improvement especially in the hard category with a high degree of occlusion. Our one-stage method does not need to rely on another refining stage, comparable to the accuracy of the two-stage method.<\/jats:p>","DOI":"10.3390\/s22062366","type":"journal-article","created":{"date-parts":[[2022,3,20]],"date-time":"2022-03-20T21:37:17Z","timestamp":1647812237000},"page":"2366","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Spatial Attention Frustum: A 3D Object Detection Method Focusing on Occluded Objects"],"prefix":"10.3390","volume":"22","author":[{"given":"Xinglei","family":"He","sequence":"first","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"Xiaohan","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"Yichun","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"Hongzeng","family":"Ji","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"Xiuhui","family":"Duan","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9884-0045","authenticated-orcid":false,"given":"Fen","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"104161","DOI":"10.1016\/j.imavis.2021.104161","article-title":"ThickSeg: Efficient semantic segmentation of large-scale 3D point clouds using multi-layer projection","volume":"108","author":"Gao","year":"2021","journal-title":"Image Vis. Comput."},{"key":"ref_2","first-page":"1","article-title":"GVnet: Gaussian model with voxel-based 3D detection network for autonomous driving","volume":"5","author":"Qin","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"8143","DOI":"10.1007\/s00521-020-04912-9","article-title":"Design of traffic object recognition system based on machine learning","volume":"33","author":"Li","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"29617","DOI":"10.1007\/s11042-021-11137-y","article-title":"A survey of 3D object detection","volume":"80","author":"Liang","year":"2021","journal-title":"Multimed. Tools Appl."},{"key":"ref_5","first-page":"334","article-title":"Review of rigid object pose estimation from a single image","volume":"26","author":"Yang","year":"2021","journal-title":"J. Image Graph."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/j.cag.2021.07.003","article-title":"A comprehensive survey of LIDAR-based 3D object detection methods with deep learning for autonomous driving","volume":"99","author":"Zamanakos","year":"2021","journal-title":"Comput. Graph."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S.L. (2018, January 1\u20135). Joint 3d proposal generation and object detection from view aggregation. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594049"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.neucom.2015.12.101","article-title":"Structure-based object detection from scene point clouds","volume":"191","author":"Hao","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.neucom.2019.09.086","article-title":"SARPNET: Shape attention regional proposal network for liDAR-based 3D object detection","volume":"379","author":"Ye","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 24\u201328). Multi-view 3D Object Detection Network for Autonomous Driving. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201322). Frustum pointnets for 3D object detection from rgb-d data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, Z., and Jia, K. (2019, January 3\u20138). Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3D object detection. Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China.","DOI":"10.1109\/IROS40897.2019.8968513"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., and Urtasun, R. (2016, January 27\u201330). Monocular 3d object detection for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.236"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1016\/j.neucom.2020.03.076","article-title":"Monocular 3D vehicle detection with multi-instance depth and geometry reasoning for autonomous driving","volume":"403","author":"Zhang","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_15","unstructured":"Brazil, G., and Liu, X. (November, January 27). M3d-rpn: Monocular 3d region proposal network for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Weng, X., and Kitani, K. (2019, January 27\u201328). Monocular 3D object detection with pseudo-lidar point cloud. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00114"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, X., Yin, W., Kong, T., Jiang, Y., Li, L., and Shen, C. (2020, January 7\u201312). Task-aware monocular depth estimation for 3D object detection. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6908"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"97228","DOI":"10.1109\/ACCESS.2021.3094201","article-title":"Yolo V4 for Advanced Traffic Sign Recognition with Synthetic Training Data Generated by Various GAN","volume":"9","author":"Dewi","year":"2021","journal-title":"IEEE Access"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201323). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). SECOND: Sparsely Embedded Convolutional Detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 15\u201320). Pointpillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 15\u201320). Pointrcnn: 3D object proposal generation and detection from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TPAMI.2020.2977026","article-title":"From Points to Parts: 3D Object Detection from Point Cloud with Part-Aware and Part-Aggregation Network","volume":"43","author":"Shi","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ye, M., Xu, S., and Cao, T. (2020, January 14\u201319). Hvnet: Hybrid voxel network for lidar based 3D object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Online.","DOI":"10.1109\/CVPR42600.2020.00170"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wang, J., Lan, S., Gao, M., and Davis, L.S. (2020). Infofocus. 3D Object Detection for Autonomous Driving with Dynamic Information Modeling. Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23\u201328 August 2020, Springer.","DOI":"10.1007\/978-3-030-58607-2_24"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Meyer, G.P., Laddha, A., Kee, E., Vallespi-Gonzalez, C., and Wellington, C.K. (2019, January 15\u201320). Lasernet: An efficient probabilistic 3D object detector for autonomous driving. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01296"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, Z., Ding, S., Li, Y., Zhao, M., Roychowdhury, S., Wallin, A., Sapiro, G., and Qiu, Q. (2019, January 27\u201328). Range adaptation for 3D object detection in lidar. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00285"},{"key":"ref_28","unstructured":"Xu, D., Anguelov, D., and Jain, A. (, January 18\u201323). Pointfusion: Deep sensor fusion for 3D bounding box estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ding, Z., Han, X., and Niethammer, M. VoteNet. A Deep Learning Label Fusion Method for Multi-Atlas Segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13\u201317 October 2019, Springer.","DOI":"10.1007\/978-3-030-32248-9_23"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Chen, X., Litany, O., and Guibas, L.J. (2020, January 14\u201319). Imvotenet: Boosting 3D object detection in point clouds with image votes. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Online.","DOI":"10.1109\/CVPR42600.2020.00446"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhu, M., Ma, C., Ji, P., and Yang, X. (2021, January 5\u20139). Cross-modality 3D object detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00382"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Vora, S., Lang, A.H., Helou, B., and Beijbom, O. (2020, January 14\u201319). Pointpainting: Sequential fusion for 3D object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Online.","DOI":"10.1109\/CVPR42600.2020.00466"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z., Trigoni, N., and Markham, A. (2020, January 14\u201319). Randla-net: Efficient semantic segmentation of large-scale point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Online.","DOI":"10.1109\/CVPR42600.2020.01112"},{"key":"ref_34","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3D classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The kitti vision benchmark suite. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_37","first-page":"1706","article-title":"Pointnet++: Deep hierarchical feature learning on point sets in a metric space","volume":"02413","author":"Qi","year":"2017","journal-title":"arXiv"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Liang, M., Yang, B., Wang, S., and Urtasun, R. (2018, January 8\u201314). Deep continuous fusion for multi-sensor 3D object detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01270-0_39"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"6655","DOI":"10.1109\/TII.2020.3048719","article-title":"Three-Attention Mechanisms for One-Stage 3-D Object Detection Based on LiDAR and Camera","volume":"17","author":"Wen","year":"2021","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"22080","DOI":"10.1109\/ACCESS.2021.3055491","article-title":"Fast and Accurate 3D Object Detection for Lidar-Camera-Based Autonomous Vehicles Using One Shared Voxel-Based Backbone","volume":"9","author":"Wen","year":"2021","journal-title":"IEEE Access"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2366\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:39:07Z","timestamp":1760135947000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2366"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,18]]},"references-count":40,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["s22062366"],"URL":"https:\/\/doi.org\/10.3390\/s22062366","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,18]]}}}