{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T10:12:17Z","timestamp":1768731137912,"version":"3.49.0"},"reference-count":35,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,1,10]],"date-time":"2025-01-10T00:00:00Z","timestamp":1736467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Comput. Sci."],"abstract":"<jats:p>Autonomous driving is the future trend. Accurate 3D object detection is a prerequisite for achieving autonomous driving. Currently, 3D object detection relies on three main sensors: monocular cameras, stereo cameras, and lidar. In comparison to methods based on stereo cameras and lidar, monocular 3D object detection offers advantages such as a broad detection field and low deployment costs. However, the accuracy of existing monocular 3D object detection methods is not ideal, especially for occluded targets. To tackle this challenge, the paper introduces a novel approach for monocular 3D object detection, denoted as SRDDP-M3D, aiming to improve monocular 3D object detection by considering spatial relationships between targets, and by refining depth predictions through a decoupled approach. We consider how objects are positioned relative to each other in the environment and encode the spatial relationships between neighboring objects, the detection performance is enhanced specially for occluded targets. Furthermore, a strategy of decoupling the prediction of target depth into two components of target visual depth and target attribute depth is introduced. This decoupling is designed to improve the accuracy of predicting the overall depth of the target. Experimental results using the KITTI dataset demonstrate that this approach substantially enhances the detection accuracy of occluded targets.<\/jats:p>","DOI":"10.3389\/fcomp.2024.1382080","type":"journal-article","created":{"date-parts":[[2025,1,10]],"date-time":"2025-01-10T06:12:43Z","timestamp":1736489563000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Monocular 3D object detection for occluded targets based on spatial relationships and decoupled depth predictions"],"prefix":"10.3389","volume":"6","author":[{"given":"Yanfei","family":"Gao","sequence":"first","affiliation":[]},{"given":"Xiongwei","family":"Miao","sequence":"additional","affiliation":[]},{"given":"Guoye","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,1,10]]},"reference":[{"key":"ref1","first-page":"9287","article-title":"M3d-rpn: monocular 3d region proposal network for object detection","volume-title":"Proceedings of the IEEE international conference on computer vision","author":"Brazil","year":"2019"},{"key":"ref2","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1007\/978-3-030-58592-1_9","article-title":"Kinematic 3d object detection in monocular video","volume-title":"Computer vision \u2013 ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXIII","author":"Brazil","year":"2020"},{"key":"ref3","doi-asserted-by":"publisher","first-page":"1259","DOI":"10.1109\/TPAMI.2017.2706685","article-title":"3d object proposals using stereo imagery for accurate object class detection","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref4","first-page":"12533","article-title":"DSGN: Deep stereo geometry network for 3D object detection","author":"Chen","year":"2020"},{"key":"ref5","first-page":"12093","article-title":"Monopair: monocular 3D object detection using pairwise spatial relationships","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Chen","year":"2020"},{"key":"ref6","first-page":"11672","article-title":"Learning depth-guided convolutions for monocular 3d object detection","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Ding","year":"2020"},{"key":"ref7","first-page":"2002","article-title":"Deep ordinal regression network for monocular depth estimation","volume-title":"Conference on Computer Vision and Pattern Recognition","author":"Fu","year":"2018"},{"key":"ref8","doi-asserted-by":"crossref","first-page":"3354","DOI":"10.1109\/CVPR.2012.6248074","article-title":"Are we ready for autonomous driving? The Kitti vision benchmark suite","volume-title":"2012 IEEE Conference on Computer Vision and Pattern Recognition","author":"Geiger","year":"2012"},{"key":"ref9","first-page":"3153","article-title":"LIGA-stereo: learning lidar geometry aware representations for stereo-based 3D detector","volume-title":"International Conference on Computer vision","author":"Guo","year":"2021"},{"key":"ref10","first-page":"7132","article-title":"Squeeze-and-excitation networks","author":"Hu","year":"2018"},{"key":"ref11","doi-asserted-by":"publisher","first-page":"1412.6980","DOI":"10.48550\/arXiv.1412.6980","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2014","journal-title":"arXiv preprint arXiv"},{"key":"ref12","first-page":"8973","article-title":"GrooMeD-NMS: grouped mathematically differen-tiable NMS for monocular 3D object detection","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Kumar","year":"2021"},{"key":"ref13","first-page":"7644","article-title":"Stereo R-CNN based 3D object detection for autonomous driving","volume-title":"Conference on Computer Vision and Pattern Recognition","author":"Li","year":"2019"},{"key":"ref14","first-page":"1019","article-title":"GS3D: an efficient 3D object detection frame-work for autonomous driving","volume-title":"Conference on computer vision and pattern recognition","author":"Li","year":"2019"},{"key":"ref15","article-title":"Scalable vision-based 3D object detection and monocular depth estimation for autonomous driving","author":"Liu","year":"2024","journal-title":"arXiv:2403.02037"},{"key":"ref16","article-title":"Learning spatial fusion for single-shot object detection","author":"Liu","year":"2019"},{"key":"ref17","doi-asserted-by":"publisher","first-page":"1810","DOI":"10.1609\/aaai.v36i2.20074","article-title":"Learning auxiliary monocular contexts helps monocular 3D object detection","volume":"36","author":"Liu","year":"2021","journal-title":"arXiv preprint arXiv"},{"key":"ref18","first-page":"15641","article-title":"Autoshape: real-time shape-aware monocular 3d object detection","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Liu","year":"2021"},{"key":"ref19","first-page":"3111","article-title":"Geometry uncertainty projection network for monocular 3d object detection","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Lu","year":"2021"},{"key":"ref20","first-page":"4721","article-title":"Delving into localization errors for monocular 3d object detection","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Ma","year":"2021"},{"key":"ref21","first-page":"5632","article-title":"3D bounding box estimation using deep learning and geometry","volume-title":"Conference on Computer Vision and Pattern Recognition","author":"Mousavian","year":"2017"},{"key":"ref22","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1007\/978-3-031-19769-7_5","article-title":"DID-M3D: decoupling instance depth for monocular 3D object detection","volume-title":"Computer Vision \u2013 ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23\u201327, 2022, proceedings, part I","author":"Peng","year":"2022"},{"key":"ref23","doi-asserted-by":"publisher","first-page":"108796","DOI":"10.1016\/j.patcog.2022.108796","article-title":"3D object detection for autonomous driving: a survey","volume":"130","author":"Qian","year":"2022","journal-title":"Pattern Recogn."},{"key":"ref24","first-page":"8555","article-title":"Categorical depth distribution network for monocular 3D object detection","volume-title":"Conference on Computer Vision and Pattern Recognition","author":"Reading","year":"2021"},{"key":"ref25","first-page":"15172","article-title":"Geometry-based distance decomposition for monocular 3D object detection","volume-title":"International Conference on Computer Vision","author":"Shi","year":"2021"},{"key":"ref26","first-page":"1991","article-title":"Disen-tangling monocular 3d object detection","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Simonelli","year":"2019"},{"key":"ref27","first-page":"8445","article-title":"Pseudo-lidar from visual depth estimation: bridging the gap in 3D object detection for autonomous driving","volume-title":"Conference on Computer Vision and Pattern Recognition","author":"Wang","year":"2019"},{"key":"ref28","first-page":"857","article-title":"Monocular 3D object detection with pseudo-lidar point cloud","volume-title":"International Conference on Computer Vision Workshops","author":"Weng","year":"2019"},{"key":"ref29","doi-asserted-by":"publisher","first-page":"e2208191","DOI":"10.2174\/18744478-v16-e2208191","article-title":"Deep 3D dynamic object detection towards successful and safe navigation for full autonomous driving","volume":"16","author":"Wijesekara","year":"2022","journal-title":"Open Transport. J."},{"key":"ref30","first-page":"3","article-title":"Cbam: convolutional block attention module","author":"Woo","year":"2018"},{"key":"ref31","article-title":"Mono CD: Monocular 3D object detection with complementary depths","author":"Yan","year":"2024"},{"key":"ref32","article-title":"Pseudo-lidar++: Accurate depth for 3D object detection in autonomous driving","volume-title":"International Conference on Learning Representations","author":"You","year":"2020"},{"key":"ref33","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2018.00255","article-title":"Deep layer aggregation","volume-title":"CVPR","author":"Yu","year":"2018"},{"key":"ref34","first-page":"3289","article-title":"Objects are different: Flexible monocular 3D object detection","volume-title":"Conference on Computer Vision and Pattern Recognition","author":"Zhang","year":"2021"},{"key":"ref35","first-page":"07850","article-title":"Objects as points","volume":"1904","author":"Zhou","year":"2019","journal-title":"arXiv preprint arXiv"}],"container-title":["Frontiers in Computer Science"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1382080\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,10]],"date-time":"2025-01-10T06:12:57Z","timestamp":1736489577000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1382080\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,10]]},"references-count":35,"alternative-id":["10.3389\/fcomp.2024.1382080"],"URL":"https:\/\/doi.org\/10.3389\/fcomp.2024.1382080","relation":{},"ISSN":["2624-9898"],"issn-type":[{"value":"2624-9898","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,10]]},"article-number":"1382080"}}