{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T11:22:00Z","timestamp":1768735320280,"version":"3.49.0"},"reference-count":49,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:00:00Z","timestamp":1675296000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Dr. Lin\u2019s start-up fund"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Perception and localization are essential for autonomous delivery vehicles, mostly estimated from 3D LiDAR sensors due to their precise distance measurement capability. This paper presents a strategy to obtain a real-time pseudo point cloud from image sensors (cameras) instead of laser-based sensors (LiDARs). Previous studies (such as PSMNet-based point cloud generation) built the algorithm based on accuracy but failed to operate in real time as LiDAR. We propose an approach to use different depth estimators to obtain pseudo point clouds similar to LiDAR to achieve better performance. Moreover, the depth estimator has used stereo imagery data to achieve more accurate depth estimation as well as point cloud results. Our approach to generating depth maps outperforms other existing approaches on KITTI depth prediction while yielding point clouds significantly faster than other approaches as well. Additionally, the proposed approach is evaluated on the KITTI stereo benchmark, where it shows effectiveness in runtime.<\/jats:p>","DOI":"10.3390\/s23031650","type":"journal-article","created":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T01:40:25Z","timestamp":1675388425000},"page":"1650","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Efficient Stereo Depth Estimation for Pseudo-LiDAR: A Self-Supervised Approach Based on Multi-Input ResNet Encoder"],"prefix":"10.3390","volume":"23","author":[{"given":"Sabir","family":"Hossain","sequence":"first","affiliation":[{"name":"Faculty of Engineering and Applied Science, Ontario Tech University, Oshawa, ON L1G 0C5, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5695-248X","authenticated-orcid":false,"given":"Xianke","family":"Lin","sequence":"additional","affiliation":[{"name":"Faculty of Engineering and Applied Science, Ontario Tech University, Oshawa, ON L1G 0C5, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1019","DOI":"10.1177\/0361198120933633","article-title":"Study of Road Autonomous Delivery Robots and Their Potential Effects on Freight Efficiency and Travel","volume":"2674","author":"Jennings","year":"2020","journal-title":"Transp. Res. Rec."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Chang, J.R., and Chen, Y.S. (2018, January 18\u201323). Pyramid Stereo Matching Network. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00567"},{"key":"ref_3","unstructured":"Godard, C., Aodha, O.M., Firman, M., and Brostow, G. (November, January 27). Digging into Self-Supervised Monocular Depth Estimation. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1231","DOI":"10.1177\/0278364913491297","article-title":"Vision Meets Robotics: The KITTI Dataset","volume":"32","author":"Geiger","year":"2013","journal-title":"Int. J. Rob. Res."},{"key":"ref_5","first-page":"234","article-title":"U-Net: Convolutional Networks for Biomedical Image Segmentation","volume":"Volume 9351","author":"Ronneberger","year":"2015","journal-title":"Medical Image Computing and Computer-Assisted Intervention\u2014MICCAI 2015"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Godard, C., Mac Aodha, O., and Brostow, G.J. (2017, January 21\u201326). Unsupervised Monocular Depth Estimation with Left-Right Consistency. Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.699"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yang, Z., Wang, P., Xu, W., Zhao, L., and Nevatia, R. (2018, January 2\u20137). Unsupervised Learning of Geometry from Videos with Edge-Aware Depth-Normal Consistency. Proceedings of the 32nd AAAI Conference on Artificial Intelligence AAAI 2018, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12257"},{"key":"ref_8","first-page":"1161","article-title":"Learning Depth from Single Monocular Images","volume":"18","author":"Saxena","year":"2005","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_9","first-page":"2366","article-title":"Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network","volume":"27","author":"Eigen","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kundu, J.N., Uppala, P.K., Pahuja, A., and Babu, R.V. (2018, January 18\u201323). AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00281"},{"key":"ref_11","unstructured":"Lee, J.H., Han, M.-K., Ko, D.W., and Suh, I.H. (2019). From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Miangoleh, S.H.M., Dille, S., Mai, L., Paris, S., and Aksoy, Y. (2021, January 20\u201325). Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00956"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"842","DOI":"10.1007\/978-3-319-46493-0_51","article-title":"Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks","volume":"Volume 9908 LNCS","author":"Xie","year":"2016","journal-title":"Computer Vision\u2014ECCV 2016"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Pilzer, A., Xu, D., Puscas, M., Ricci, E., and Sebe, N. (2018, January 5\u20138). Unsupervised Adversarial Depth Estimation Using Cycled Generative Networks. Proceedings of the 2018 International Conference on 3D Vision, 3DV 2018, Verona, Italy.","DOI":"10.1109\/3DV.2018.00073"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Luo, Y., Ren, J., Lin, M., Pang, J., Sun, W., Li, H., and Lin, L. (2018, January 18\u201323). Single View Stereo Matching. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00024"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Guo, X., Yang, K., Yang, W., Wang, X., and Li, H. (2019, January 15\u201320). Group-Wise Correlation Stereo Network. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00339"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lipson, L., Teed, Z., and Deng, J. (2021, January 1\u20133). RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. Proceedings of the 2021 International Conference on 3D Vision, 3DV 2021, London, UK.","DOI":"10.1109\/3DV53792.2021.00032"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gu, X., Fan, Z., Zhu, S., Dai, Z., Tan, F., and Tan, P. (2020, January 13\u201319). Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00257"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Jensen, R., Dahl, A., Vogiatzis, G., Tola, E., and Aanaes, H. (2014, January 23\u201328). Large Scale Multi-View Stereopsis Evaluation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.59"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Shen, Z., Dai, Y., and Rao, Z. (2021, January 20\u201325). CFNET: Cascade and Fused Cost Volume for Robust Stereo Matching. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01369"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chabra, R., Straub, J., Sweeney, C., Newcombe, R., and Fuchs, H. (2019, January 15\u201320). Stereodrnet: Dilated Residual Stereonet. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01206"},{"key":"ref_22","first-page":"22158","article-title":"Hierarchical Neural Architecture Search for Deep Stereo Matching","volume":"2020","author":"Cheng","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"427","DOI":"10.5194\/isprsannals-II-3-W5-427-2015","article-title":"Joint 3d Estimation of Vehicles and Scene Flow","volume":"2","author":"Menze","year":"2015","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.isprsjprs.2017.09.013","article-title":"Object Scene Flow","volume":"140","author":"Menze","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Xu, G., Cheng, J., Guo, P., and Yang, X. (2022). Attention Concatenation Volume for Accurate and Efficient Stereo Matching. arXiv.","DOI":"10.1109\/CVPR52688.2022.01264"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image Quality Assessment: From Error Visibility to Structural Similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhou, T., Brown, M., Snavely, N., and Lowe, D.G. (2017, January 21\u201326). Unsupervised Learning of Depth and Ego-Motion from Video. Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.700"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, C., Buenaposada, J.M., Zhu, R., and Lucey, S. (2018, January 18\u201323). Learning Depth from Monocular Videos Using Direct Methods. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00216"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., and Weinberger, K.Q. (2019, January 15\u201320). Pseudo-Lidar from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00864"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., and Geiger, A. (2017, January 10\u201312). Sparsity Invariant CNNs. Proceedings of the 2017 International Conference on 3D Vision, 3DV 2017, Qingdao, China.","DOI":"10.1109\/3DV.2017.00012"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"506","DOI":"10.1007\/978-3-030-01252-6_30","article-title":"Learning Monocular Depth by Distilling Cross-Domain Stereo Networks","volume":"Volume 11215 LNCS","author":"Guo","year":"2018","journal-title":"Computer Vision\u2014ECCV 2018"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Kuznietsov, Y., St\u00fcckler, J., and Leibe, B. (2017, January 21\u201326). Semi-Supervised Deep Learning for Monocular Depth Map Prediction. Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.238"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1007\/978-3-030-01237-3_50","article-title":"Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry","volume":"Volume 11212 LNCS","author":"Yang","year":"2018","journal-title":"Computer Vision\u2014ECCV 2018"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Fu, H., Gong, M., Wang, C., Batmanghelich, K., and Tao, D. (2018, January 18\u201323). Deep Ordinal Regression Network for Monocular Depth Estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00214"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Mahjourian, R., Wicke, M., and Angelova, A. (2018, January 18\u201323). Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00594"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yin, Z., and Shi, J. (2018, January 18\u201323). GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00212"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., and Black, M.J. (2019, January 15\u201320). Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01252"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"2624","DOI":"10.1109\/TPAMI.2019.2930258","article-title":"Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding","volume":"42","author":"Luo","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Casser, V., Pirk, S., Mahjourian, R., and Angelova, A. (February, January 27). Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, HI, USA.","DOI":"10.1609\/aaai.v33i01.33018001"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1007\/978-3-319-46484-8_45","article-title":"Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue","volume":"Volume 9912 LNCS","author":"Garg","year":"2016","journal-title":"Computer Vision\u2014 ECCV 2016"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Mehta, I., Sakurikar, P., and Narayanan, P.J. (2018, January 5\u20138). Structured Adversarial Training for Unsupervised Monocular Depth Estimation. Proceedings of the 2018 International Conference on 3D Vision, 3DV 2018, Verona, Italy.","DOI":"10.1109\/3DV.2018.00044"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Poggi, M., Tosi, F., and Mattoccia, S. (2018, January 5\u20138). Learning Monocular Depth Estimation with Unsupervised Trinocular Assumptions. Proceedings of the 2018 International Conference on 3D Vision, 3DV 2018, Verona, Italy.","DOI":"10.1109\/3DV.2018.00045"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Pillai, S., Ambru\u015f, R., and Gaidon, A. (2019, January 20\u201324). Superdepth: Self-Supervised, Super-Resolved Monocular Depth Estimation. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8793621"},{"key":"ref_45","first-page":"124","article-title":"Disparity Estimation by Simultaneous Edge Drawing","volume":"Volume 10117 LNCS","author":"Sutherland","year":"2017","journal-title":"Computer Vision\u2014ACCV 2016 Workshops"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Teed, Z., and Deng, J. (2021, January 20\u201325). RAFT-3D: Scene Flow Using Rigid-Motion Embeddings. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00827"},{"key":"ref_47","unstructured":"Brickwedde, F., Abraham, S., and Mester, R. (November, January 27). Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Tosi, F., Aleotti, F., Poggi, M., and Mattoccia, S. (2019, January 15\u201320). Learning Monocular Depth Estimation Infusing Traditional Stereo Knowledge. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01003"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Seki, A., and Pollefeys, M. (2016, January 19\u201322). Patch Based Confidence Prediction for Dense Disparity Map. Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK.","DOI":"10.5244\/C.30.23"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1650\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:22:26Z","timestamp":1760120546000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1650"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,2]]},"references-count":49,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["s23031650"],"URL":"https:\/\/doi.org\/10.3390\/s23031650","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,2]]}}}