{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:02:53Z","timestamp":1760230973391,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T00:00:00Z","timestamp":1660780800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Leading Innovative and Entrepreneur Team Introduction Program of Zhejiang","award":["2018R01001","226202200096"],"award-info":[{"award-number":["2018R01001","226202200096"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2018R01001","226202200096"],"award-info":[{"award-number":["2018R01001","226202200096"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Monocular 3D object detection is very challenging in autonomous driving due to the lack of depth information. This paper proposes a one-stage monocular 3D object detection network (MDS Net), which uses the anchor-free method to detect 3D objects in a per-pixel prediction. Firstly, a novel depth-based stratification structure is developed to improve the network\u2019s ability of depth prediction, which exploits the mathematical relationship between the size and the depth in the image of an object based on the pinhole model. Secondly, a new angle loss function is developed to further improve both the accuracy of the angle prediction and the convergence speed of training. An optimized Soft-NMS is finally applied in the post-processing stage to adjust the confidence score of the candidate boxes. Experiment results on the KITTI benchmark demonstrate that the proposed MDS-Net outperforms the existing monocular 3D detection methods in both tasks of 3D detection and BEV detection while fulfilling real-time requirements.<\/jats:p>","DOI":"10.3390\/s22166197","type":"journal-article","created":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T23:28:41Z","timestamp":1660865321000},"page":"6197","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["MDS-Net: Multi-Scale Depth Stratification 3D Object Detection from Monocular Images"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7544-4640","authenticated-orcid":false,"given":"Zhouzhen","family":"Xie","sequence":"first","affiliation":[{"name":"Institute of Marine Electronic and Intelligent System, Ocean College, Zhejiang University, Zhoushan 316021, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuying","family":"Song","sequence":"additional","affiliation":[{"name":"Institute of Marine Electronic and Intelligent System, Ocean College, Zhejiang University, Zhoushan 316021, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingxuan","family":"Wu","sequence":"additional","affiliation":[{"name":"Institute of Marine Electronic and Intelligent System, Ocean College, Zhejiang University, Zhoushan 316021, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zecheng","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Marine Electronic and Intelligent System, Ocean College, Zhejiang University, Zhoushan 316021, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chunyi","family":"Song","sequence":"additional","affiliation":[{"name":"Institute of Marine Electronic and Intelligent System, Ocean College, Zhejiang University, Zhoushan 316021, China"},{"name":"The Engineering Research Center of Oceanic Sensing Technology and Equipment, Ministry of Education, Zhoushan 316021, China"},{"name":"Donghai Lab, Zhoushan 316021, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2279-0632","authenticated-orcid":false,"given":"Zhiwei","family":"Xu","sequence":"additional","affiliation":[{"name":"Institute of Marine Electronic and Intelligent System, Ocean College, Zhejiang University, Zhoushan 316021, China"},{"name":"The Engineering Research Center of Oceanic Sensing Technology and Equipment, Ministry of Education, Zhoushan 316021, China"},{"name":"Donghai Lab, Zhoushan 316021, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201323). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201323). Frustum PointNets for 3D Object Detection From RGB-D Data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_3","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"He, C., Zeng, H., Huang, J., Hua, X.S., and Zhang, L. (2020, January 13\u201319). Structure aware single-stage 3D object detection from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01189"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Shi, W., and Rajkumar, R. (2020, January 13\u201319). Point-gnn: Graph neural network for 3D object detection in a point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00178"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Beltr\u00e1n, J., Guindel, C., Moreno, F.M., Cruzado, D., Garc\u00eda, F., and De La Escalera, A. (2018, January 4\u20137). BirdNet: A 3D Object Detection Framework from LiDAR Information. Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA.","DOI":"10.1109\/ITSC.2018.8569311"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Barrera, A., Guindel, C., Beltr\u00e1n, J., and Garc\u00eda, F. (2020, January 20\u201323). BirdNet+: End-to-End 3D Object Detection in LiDAR Bird\u2019s Eye View. Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece.","DOI":"10.1109\/ITSC45102.2020.9294293"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-View 3D Object Detection Network for Autonomous Driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander, S.L. (2018, January 1\u20135). Joint 3D Proposal Generation and Object Detection from View Aggregation. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594049"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24\u201327). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_13","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016, January 5\u201310). R-FCN: Object Detection via Region-based Fully Convolutional Networks. Proceedings of the Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Erhan, D., Szegedy, C., Toshev, A., and Anguelov, D. (2014, January 24\u201327). Scalable Object Detection using Deep Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.276"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Leibe, B., Matas, J., Sebe, N., and Welling, M. (2016, January 11\u201314). SSD: Single Shot MultiBox Detector. Proceedings of the Computer Vision\u2014ECCV, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46478-7"},{"key":"ref_17","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). FCOS: Fully Convolutional One-Stage Object Detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"7389","DOI":"10.1109\/TIP.2020.3002345","article-title":"FoveaBox: Beyound Anchor-Based Object Detection","volume":"29","author":"Kong","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_19","unstructured":"Weng, X., and Kitani, K. (November, January 27). Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops (ICCV), Seoul, Korea."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xu, B., and Chen, Z. (2018, January 18\u201323). Multi-Level Fusion Based 3D Object Detection From Monocular Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00249"},{"key":"ref_21","unstructured":"Ma, X., Wang, Z., Li, H., Zhang, P., Ouyang, W., and Fan, X. (November, January 27). Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., and Urtasun, R. (2016, January 27\u201330). Monocular 3D Object Detection for Autonomous Driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.236"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Chabot, F., Chaouch, M., Rabarisoa, J., Teuliere, C., and Chateau, T. (2017, January 21\u201326). Deep MANTA: A Coarse-To-Fine Many-Task Network for Joint 2D and 3D Vehicle Analysis From Monocular Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.198"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Mousavian, A., Anguelov, D., Flynn, J., and Kosecka, J. (2017, January 21\u201326). 3D Bounding Box Estimation Using Deep Learning and Geometry. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.597"},{"key":"ref_25","unstructured":"Brazil, G., and Liu, X. (November, January 27). M3D-RPN: Monocular 3D Region Proposal Network for Object Detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Peng, S., Liu, Y., Huang, Q., Zhou, X., and Bao, H. (2019, January 16\u201320). PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00469"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhou, Y., He, Y., Zhu, H., Wang, C., Li, H., and Jiang, Q. (2021, January 19\u201325). Monocular 3D object detection: An extrinsic parameter free approach. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference.","DOI":"10.1109\/CVPR46437.2021.00747"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lu, Y., Ma, X., Yang, L., Zhang, T., Liu, Y., Chu, Q., Yan, J., and Ouyang, W. (2021, January 11\u201317). Geometry uncertainty projection network for monocular 3d object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual Conference.","DOI":"10.1109\/ICCV48922.2021.00310"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Huang, K.C., Wu, T.H., Su, H.T., and Hsu, W.H. (2022, January 19\u201324). MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00398"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Shi, X., Chen, Z., and Kim, T.K. (2020, January 23\u201328). Distance-Normalized Unified Representation for Monocular 3D Object Detection. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58526-6_6"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Brazil, G., Pons-Moll, G., Liu, X., and Schiele, B. (2020, January 23\u201328). Kinematic 3D object detection in monocular video. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58592-1_9"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The KITTI vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Jiang, B., Luo, R., Mao, J., Xiao, T., and Jiang, Y. (2018, January 8\u201314). Acquisition of Localization Confidence for Accurate Object Detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_48"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Wang, T., Zhu, X., Pang, J., and Lin, D. (2021, January 11\u201317). FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual Conference.","DOI":"10.1109\/ICCVW54120.2021.00107"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., and Yang, J. (2020). Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection. arXiv.","DOI":"10.1109\/CVPR46437.2021.01146"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). SECOND: Sparsely Embedded Convolutional Detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bodla, N., Singh, B., Chellappa, R., and Davis, L.S. (2017, January 22\u201329). Soft-NMS\u2014Improving Object Detection with One Line of Code. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.593"},{"key":"ref_39","unstructured":"Chen, X., Kundu, K., Zhu, Y., Berneshawi, A.G., Ma, H., Fidler, S., and Urtasun, R. (2015, January 7\u201312). 3D object proposals for accurate object class detection. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_40","unstructured":"Simonelli, A., Bulo, S.R., Porzi, L., Lopez-Antequera, M., and Kontschieder, P. (November, January 27). Disentangling Monocular 3D Object Detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Ding, M., Huo, Y., Yi, H., Wang, Z., Shi, J., Lu, Z., and Luo, P. (2020, January 14\u201319). Learning depth-guided convolutions for monocular 3D object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01169"},{"key":"ref_43","unstructured":"Wang, T., Zhu, X., Pang, J., and Lin, D. (2021, January 8\u201311). Probabilistic and Geometric Depth: Detecting Objects in Perspective. Proceedings of the Conference on Robot Learning (CoRL), London, UK."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/16\/6197\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:11:52Z","timestamp":1760141512000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/16\/6197"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,18]]},"references-count":43,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["s22166197"],"URL":"https:\/\/doi.org\/10.3390\/s22166197","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,8,18]]}}}