{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T15:04:32Z","timestamp":1778598272596,"version":"3.51.4"},"reference-count":148,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2022,7,18]],"date-time":"2022-07-18T00:00:00Z","timestamp":1658102400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Secretariad Universitatsi Recercadel Departamentd Empresai Coneixement de la Generalitat de Catalunya","award":["2020 FISDU 00405"],"award-info":[{"award-number":["2020 FISDU 00405"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In current decades, significant advancements in robotics engineering and autonomous vehicles have improved the requirement for precise depth measurements. Depth estimation (DE) is a traditional task in computer vision that can be appropriately predicted by applying numerous procedures. This task is vital in disparate applications such as augmented reality and target tracking. Conventional monocular DE (MDE) procedures are based on depth cues for depth prediction. Various deep learning techniques have demonstrated their potential applications in managing and supporting the traditional ill-posed problem. The principal purpose of this paper is to represent a state-of-the-art review of the current developments in MDE based on deep learning techniques. For this goal, this paper tries to highlight the critical points of the state-of-the-art works on MDE from disparate aspects. These aspects include input data shapes and training manners such as supervised, semi-supervised, and unsupervised learning approaches in combination with applying different datasets and evaluation indicators. At last, limitations regarding the accuracy of the DL-based MDE models, computational time requirements, real-time inference, transferability, input images shape and domain adaptation, and generalization are discussed to open new directions for future research.<\/jats:p>","DOI":"10.3390\/s22145353","type":"journal-article","created":{"date-parts":[[2022,7,19]],"date-time":"2022-07-19T00:19:21Z","timestamp":1658189961000},"page":"5353","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":142,"title":["Monocular Depth Estimation Using Deep Learning: A Review"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6392-4727","authenticated-orcid":false,"given":"Armin","family":"Masoumian","sequence":"first","affiliation":[{"name":"Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain"},{"name":"Department of Electrical and Computer Engineering, University of California, Riverside, CA 92521, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5421-1637","authenticated-orcid":false,"given":"Hatem A.","family":"Rashwan","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6317-6538","authenticated-orcid":false,"given":"Juli\u00e1n","family":"Cristiano","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5993-3903","authenticated-orcid":false,"given":"M. Salman","family":"Asif","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of California, Riverside, CA 92521, USA"}]},{"given":"Domenec","family":"Puig","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering and Mathematics, University of Rovira i Virgili, 43007 Tarragona, Spain"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,18]]},"reference":[{"key":"ref_1","unstructured":"Sun, X., Xu, Z., Meng, N., Lam, E.Y., and So, H.K.H. (2016, January 24\u201329). Data-driven light field depth estimation using deep Convolutional Neural Networks. Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2021","DOI":"10.1364\/JOSAA.32.002021","article-title":"Computational photography with plenoptic camera and light field capture: Tutorial","volume":"32","author":"Lam","year":"2015","journal-title":"J. Opt. Soc. Am. A"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Khan, W., Ansell, D., Kuru, K., and Amina, M. (2016, January 4\u20136). Automated aircraft instrument reading using real time video analysis. Proceedings of the 2016 IEEE 8th International Conference on Intelligent Systems (IS), Sofia, Bulgaria.","DOI":"10.1109\/IS.2016.7737454"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Khan, W., Hussain, A., Kuru, K., and Al-Askar, H. (2020). Pupil localisation and eye centre estimation using machine learning and computer vision. Sensors, 20.","DOI":"10.3390\/s20133785"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"5667264","DOI":"10.1155\/2022\/5667264","article-title":"PSOWNNs-CNN: A Computational Radiology for Breast Cancer Diagnosis Improvement Based on Image Processing Using Machine Learning Methods","volume":"2022","author":"Nomani","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1007\/s10207-015-0286-9","article-title":"Understanding trust in privacy-aware video surveillance systems","volume":"15","author":"Rashwan","year":"2016","journal-title":"Int. J. Inf. Secur."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Godard, C., Aodha, O.M., and Brostow, G.J. (2017, January 21\u201326). Unsupervised Monocular Depth Estimation with Left-Right Consistency. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.699"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2024","DOI":"10.1109\/TPAMI.2015.2505283","article-title":"Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields","volume":"38","author":"Liu","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","unstructured":"Eigen, D., Puhrsch, C., and Fergus, R. (2014). Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. Adv. Neural Inf. Process. Syst., 27."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Cocia\u015f, T.T., Grigorescu, S.M., and Moldoveanu, F. (2012, January 24\u201326). Multiple-superquadrics based object surface estimation for grasping in service robotics. Proceedings of the 2012 13th International Conference on Optimization of Electrical and Electronic Equipment (OPTIM), Brasov, Romania.","DOI":"10.1109\/OPTIM.2012.6231780"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Kalia, M., Navab, N., and Salcudean, T. (2019, January 20\u201324). A Real-Time Interactive Augmented Reality Depth Estimation Technique for Surgical Robotics. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8793610"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1229","DOI":"10.1007\/s11432-012-4587-6","article-title":"An overview of computational photography","volume":"55","author":"Suo","year":"2012","journal-title":"Sci. China Inf. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lukac, R. (2017). Computational Photography: Methods and Applications, CRC Press.","DOI":"10.1201\/b10284"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Masoumian, A., Kazemi, P., Montazer, M.C., Rashwan, H.A., and Valls, D.P. (2020, January 12\u201315). Using The Feedback of Dynamic Active-Pixel Vision Sensor (Davis) to Prevent Slip in Real Time. Proceedings of the 2020 6th International Conference on Mechatronics and Robotics Engineering (ICMRE), Barcelona, Spain.","DOI":"10.1109\/ICMRE49073.2020.9065017"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1016\/j.neucom.2020.12.089","article-title":"Deep Learning for Monocular Depth Estimation: A Review","volume":"438","author":"Ming","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhou, T., Brown, M., Snavely, N., and Lowe, D.G. (2017, January 21\u201326). Unsupervised learning of depth and ego-motion from video. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.700"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Khan, F., Salahuddin, S., and Javidnia, H. (2020). Deep learning-based monocular depth estimation methods\u2014A state-of-the-art review. Sensors, 20.","DOI":"10.3390\/s20082272"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tosi, F., Aleotti, F., Poggi, M., and Mattoccia, S. (2019, January 15\u201320). Learning monocular depth estimation infusing traditional stereo knowledge. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01003"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ramamonjisoa, M., and Lepetit, V. (2019, January 27\u201328). Sharpnet: Fast and accurate recovery of occluding contours in monocular depth estimation. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00266"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Schonberger, J.L., and Frahm, J.M. (2016, January 27\u201330). Structure-from-motion revisited. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.445"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Javidnia, H., and Corcoran, P. (2017, January 22\u201329). Accurate depth map estimation from small motions. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.289"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1023\/A:1014573219977","article-title":"A taxonomy and evaluation of dense two-frame stereo correspondence algorithms","volume":"47","author":"Scharstein","year":"2002","journal-title":"Int. J. Comput. Vis."},{"key":"ref_23","unstructured":"Heikkila, J., and Silv\u00e9n, O. (1997, January 17\u201319). A four-step camera calibration procedure with implicit image correction. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1330","DOI":"10.1109\/34.888718","article-title":"A flexible new technique for camera calibration","volume":"22","author":"Zhang","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"5509","DOI":"10.1109\/ACCESS.2016.2603220","article-title":"A depth map post-processing approach based on adaptive random walk with restart","volume":"4","author":"Javidnia","year":"2016","journal-title":"IEEE Access"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kuznietsov, Y., Stuckler, J., and Leibe, B. (2017, January 21\u201326). Semi-supervised deep learning for monocular depth map prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.238"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"043041","DOI":"10.1117\/1.JEI.27.4.043041","article-title":"Semiparallel deep neural network hybrid architecture: First application on depth from monocular camera","volume":"27","author":"Bazrafkan","year":"2018","journal-title":"J. Electron. Imaging"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Fu, H., Gong, M., Wang, C., Batmanghelich, K., and Tao, D. (2018, January 18\u201323). Deep ordinal regression network for monocular depth estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00214"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1167\/9.1.10","article-title":"Binocular depth discrimination and estimation beyond interaction space","volume":"9","author":"Allison","year":"2009","journal-title":"J. Vis."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1167\/10.6.19","article-title":"Stereoscopic perception of real depths at large distances","volume":"10","author":"Palmisano","year":"2010","journal-title":"J. Vis."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"3441","DOI":"10.1016\/0042-6989(96)00090-9","article-title":"Stereoscopic depth constancy depends on the subject\u2019s task","volume":"36","author":"Glennerster","year":"1996","journal-title":"Vis. Res."},{"key":"ref_32","unstructured":"S\u00fcvari, C.B. (2021). Semi-Supervised Iterative Teacher-Student Learning for Monocular Depth Estimation. [Master\u2019s Thesis, Middle East Technical University]."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Mahjourian, R., Wicke, M., and Angelova, A. (2018, January 18\u201323). Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00594"},{"key":"ref_34","unstructured":"Masoumian, A., Rashwan, H.A., Abdulwahab, S., Cristiano, J., and Puig, D. (2021). GCNDepth: Self-supervised Monocular Depth Estimation based on Graph Convolutional Network. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"CS Kumar, A., Bhandarkar, S.M., and Prasad, M. (2018, January 18\u201323). Depthnet: A recurrent neural network architecture for monocular depth prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00066"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1778","DOI":"10.1109\/LRA.2017.2657002","article-title":"Toward domain independence for learning-based monocular depth estimation","volume":"2","author":"Mancini","year":"2017","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Qi, X., Liao, R., Liu, Z., Urtasun, R., and Jia, J. (2018, January 18\u201323). Geonet: Geometric neural network for joint depth and surface normal estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00037"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ummenhofer, B., Zhou, H., Uhrig, J., Mayer, N., Ilg, E., Dosovitskiy, A., and Brox, T. (2017, January 21\u201326). Demon: Depth and motion network for learning monocular stereo. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.596"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhan, H., Garg, R., Weerasekera, C.S., Li, K., Agarwal, H., and Reid, I. (2018, January 18\u201323). Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00043"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Garg, R., Bg, V.K., Carneiro, G., and Reid, I. (2016, January 11\u201314). Unsupervised cnn for single view depth estimation: Geometry to the rescue. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_45"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Luo, Y., Ren, J., Lin, M., Pang, J., Sun, W., Li, H., and Lin, L. (2018, January 18\u201323). Single view stereo matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00024"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Xie, J., Girshick, R., and Farhadi, A. (2016). Deep3d: Fully automatic 2d-to-3d video conversion with deep convolutional neural networks. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46493-0_51"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1612","DOI":"10.1007\/s11431-020-1582-8","article-title":"Monocular depth estimation based on deep learning: An overview","volume":"63","author":"Zhao","year":"2020","journal-title":"Sci. China Technol. Sci."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Dong, X., Garratt, M.A., Anavatti, S.G., and Abbass, H.A. (2021). Towards real-time monocular depth estimation for robotics: A survey. arXiv.","DOI":"10.1109\/TITS.2022.3160741"},{"key":"ref_45","unstructured":"Vyas, P., Saxena, C., Badapanda, A., and Goswami, A. (2022). Outdoor Monocular Depth Estimation: A Research Review. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"7152","DOI":"10.1364\/AO.52.007152","article-title":"Passive depth estimation using chromatic aberration and a depth from defocus approach","volume":"52","author":"Champagnat","year":"2013","journal-title":"Appl. Opt."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Rodrigues, R.T., Miraldo, P., Dimarogonas, D.V., and Aguiar, A.P. (August, January 31). Active depth estimation: Stability analysis and its applications. Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9196670"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"29375","DOI":"10.1007\/s11042-020-09479-0","article-title":"Analysis of RGB-D camera technologies for supporting different facial usage scenarios","volume":"79","author":"Ulrich","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Kim, H.M., Kim, M.S., Lee, G.J., Jang, H.J., and Song, Y.M. (2020). Miniaturized 3D depth sensing-based smartphone light field camera. Sensors, 20.","DOI":"10.3390\/s20072129"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1283","DOI":"10.1109\/34.735802","article-title":"A variable window approach to early vision","volume":"20","author":"Boykov","year":"1998","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Meng, Z., Kong, X., Meng, L., and Tomiyama, H. (2021). Stereo Vision-Based Depth Estimation. Advances in Artificial Intelligence and Data Engineering, Springer.","DOI":"10.1007\/978-981-15-3514-7_90"},{"key":"ref_52","unstructured":"Sanz, P.R., Mezcua, B.R., and Pena, J.M.S. (2012). Depth Estimation\u2014An Introduction, IntechOpen."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1109\/CVPR.1999.786928","article-title":"Computing rectifying homographies for stereo vision","volume":"Volume 1","author":"Loop","year":"1999","journal-title":"Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)"},{"key":"ref_54","unstructured":"Fusiello, A., Trucco, E., and Verri, A. (1997, January 8\u201311). Rectification with unconstrained stereo geometry. Proceedings of the British Machine Vision Conference (BMVC), Colchester, UK."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Kat, R., Jevnisek, R., and Avidan, S. (2018, January 18\u201323). Matching pixels using co-occurrence statistics. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00188"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"108635","DOI":"10.1016\/j.measurement.2020.108635","article-title":"Stereo-rectification and homography-transform-based stereo matching methods for stereo digital image correlation","volume":"173","author":"Zhong","year":"2021","journal-title":"Measurement"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Zhou, K., Meng, X., and Cheng, B. (2020). Review of stereo matching algorithms based on deep learning. Comput. Intell. Neurosci.","DOI":"10.1155\/2020\/8562323"},{"key":"ref_58","unstructured":"Alagoz, B.B. (2008). Obtaining depth maps from color images by region based stereo matching algorithms. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Luo, W., Schwing, A.G., and Urtasun, R. (2016, January 27\u201330). Efficient deep learning for stereo matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.614"},{"key":"ref_60","first-page":"133","article-title":"A Multistage Hybrid Median Filter Design of Stereo Matching Algorithms on Image Processing","volume":"10","author":"Aboali","year":"2018","journal-title":"J. Telecommun. Electron. Comput. Eng. (JTEC)"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"34221","DOI":"10.1007\/s11042-020-09906-2","article-title":"Hardware-friendly architecture for a pseudo 2D weighted median filter based on sparse-window approach","volume":"80","author":"Hyun","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"da Silva Vieira, G., Soares, F.A.A., Laureano, G.T., Parreira, R.T., Ferreira, J.C., and Salvini, R. (2018, January 25\u201328). Disparity Map Adjustment: A Post-Processing Technique. Proceedings of the 2018 IEEE Symposium on Computers and Communications (ISCC), Natal, Brazil.","DOI":"10.1109\/ISCC.2018.8538562"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? the kitti vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., and Brox, T. (2016, January 27\u201330). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.438"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"7733","DOI":"10.1109\/TITS.2021.3071886","article-title":"Deep direct visual odometry","volume":"23","author":"Zhao","year":"2021","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., and Cottrell, G. (2018, January 12\u201315). Understanding convolution for semantic segmentation. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00163"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Xue, F., Wang, X., Li, S., Wang, Q., Wang, J., and Zha, H. (2019, January 15\u201320). Beyond tracking: Selecting memory and refining poses for deep visual odometry. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00877"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Clark, R., Wang, S., Wen, H., Markham, A., and Trigoni, N. (2017, January 4\u20139). Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem. Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11215"},{"key":"ref_69","unstructured":"Bhat, S.F., Alhashim, I., and Wonka, P. (2021, January 20\u201325). Adabins: Depth estimation using adaptive bins. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Wang, R., Pizer, S.M., and Frahm, J.M. (2019, January 15\u201320). Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00570"},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"6813","DOI":"10.1109\/LRA.2020.3017478","article-title":"Don\u2019t forget the past: Recurrent depth estimation from monocular video","volume":"5","author":"Patil","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_72","unstructured":"Lee, J.H., Han, M.K., Ko, D.W., and Suh, I.H. (2019). From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Kuznietsov, Y., Proesmans, M., and Van Gool, L. (2021, January 3\u20138). Comoda: Continuous monocular depth adaptation using past experiences. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00295"},{"key":"ref_74","unstructured":"Ramirez, P.Z., Poggi, M., Tosi, F., Mattoccia, S., and Di Stefano, L. (2018, January 2\u20136). Geometry meets semantics for semi-supervised monocular depth estimation. Proceedings of the Asian Conference on Computer Vision, Perth, Australia."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Aleotti, F., Tosi, F., Poggi, M., and Mattoccia, S. (2018, January 8\u201314). Generative adversarial networks for unsupervised monocular depth prediction. Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany.","DOI":"10.1007\/978-3-030-11009-3_20"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Pilzer, A., Xu, D., Puscas, M., Ricci, E., and Sebe, N. (2018, January 5\u20138). Unsupervised adversarial depth estimation using cycled generative networks. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00073"},{"key":"ref_77","unstructured":"Watson, J., Firman, M., Brostow, G.J., and Turmukhambetov, D. (November, January 27). Self-supervised monocular depth hints. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Yin, Z., and Shi, J. (2018, January 18\u201323). Geonet: Unsupervised learning of dense depth, optical flow and camera pose. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00212"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Casser, V., Pirk, S., Mahjourian, R., and Angelova, A. (2019, January 27). Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.","DOI":"10.1609\/aaai.v33i01.33018001"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., and Black, M.J. (2019, January 15\u201320). Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01252"},{"key":"ref_81","unstructured":"Gordon, A., Li, H., Jonschkowski, R., and Angelova, A. (November, January 27). Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_82","unstructured":"Zhou, J., Wang, Y., Qin, K., and Zeng, W. (November, January 27). Unsupervised high-resolution depth learning from videos with dual networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_83","unstructured":"Godard, C., Mac Aodha, O., Firman, M., and Brostow, G.J. (November, January 27). Digging into self-supervised monocular depth estimation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Shu, C., Yu, K., Duan, Z., and Yang, K. (2020, January 23\u201328). Feature-metric loss for self-supervised learning of depth and egomotion. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58529-7_34"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012, January 7\u201313). Indoor segmentation and support inference from rgbd images. Proceedings of the European Conference on Computer Vision, Florence, Italy.","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"ref_86","unstructured":"Teed, Z., and Deng, J. (2018). Deepv2d: Video to depth with differentiable structure from motion. arXiv."},{"key":"ref_87","unstructured":"Yin, W., Liu, Y., Shen, C., and Yan, Y. (November, January 27). Enforcing geometric constraints of virtual normal for depth prediction. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Yu, Z., and Gao, S. (2020, January 13\u201319). Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00202"},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Zhao, S., Fu, H., Gong, M., and Tao, D. (2019, January 15\u201320). Geometry-aware symmetric domain adaptation for monocular depth estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01002"},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Jung, D., Choi, J., Lee, Y., Kim, D., Kim, C., Manocha, D., and Lee, D. (2021, January 10\u201317). DnD: Dense Depth Estimation in Crowded Dynamic Indoor Scenes. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01256"},{"key":"ref_91","unstructured":"Alhashim, I., and Wonka, P. (2018). High quality monocular depth estimation via transfer learning. arXiv."},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Ma, F., Cavalheiro, G.V., and Karaman, S. (2019, January 20\u201324). Self-supervised sparse-to-dense: Self-supervised depth completion from lidar and monocular camera. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8793637"},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Guizilini, V., Ambrus, R., Pillai, S., Raventos, A., and Gaidon, A. (2020, January 13\u201319). 3d packing for self-supervised monocular depth estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00256"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., and Schiele, B. (2016, January 27\u201330). The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.350"},{"key":"ref_95","first-page":"35","article-title":"Unsupervised scale-consistent depth and ego-motion learning from monocular video","volume":"32","author":"Bian","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_96","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1109\/TPAMI.2008.132","article-title":"Make3d: Learning 3d scene structure from a single still image","volume":"31","author":"Saxena","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_97","first-page":"1571","article-title":"Make3D: Depth Perception from a Single Still Image","volume":"3","author":"Saxena","year":"2008","journal-title":"AAAI"},{"key":"ref_98","doi-asserted-by":"crossref","first-page":"2144","DOI":"10.1109\/TPAMI.2014.2316835","article-title":"Depth transfer: Depth extraction from video using non-parametric sampling","volume":"36","author":"Karsch","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Liu, M., Salzmann, M., and He, X. (2014, January 23\u201328). Discrete-continuous depth estimation from a single image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.97"},{"key":"ref_100","doi-asserted-by":"crossref","unstructured":"Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., and Navab, N. (2016, January 25\u201328). Deeper depth prediction with fully convolutional residual networks. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.32"},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Wang, C., Buenaposada, J.M., Zhu, R., and Lucey, S. (2018, January 18\u201323). Learning depth from monocular videos using direct methods. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00216"},{"key":"ref_102","doi-asserted-by":"crossref","unstructured":"Jia, S., Pei, X., Yao, W., and Wong, S. (2021). Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos. arXiv.","DOI":"10.1109\/TITS.2022.3219604"},{"key":"ref_103","unstructured":"Vasiljevic, I., Kolkin, N., Zhang, S., Luo, R., Wang, H., Dai, F.Z., Daniele, A.F., Mostajabi, M., Basart, S., and Walter, M.R. (2019). Diode: A dense indoor and outdoor depth dataset. arXiv."},{"key":"ref_104","doi-asserted-by":"crossref","unstructured":"Scharstein, D., Hirschm\u00fcller, H., Kitajima, Y., Krathwohl, G., Ne\u0161i\u0107, N., Wang, X., and Westling, P. (2014, January 2\u20135). High-resolution stereo datasets with subpixel-accurate ground truth. Proceedings of the German Conference on Pattern Recognition, M\u00fcnster, Germany.","DOI":"10.1007\/978-3-319-11752-2_3"},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Yang, G., Song, X., Huang, C., Deng, Z., Shi, J., and Zhou, B. (2019, January 15\u201320). Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00099"},{"key":"ref_106","unstructured":"Couprie, C., Farabet, C., Najman, L., and LeCun, Y. (2013). Indoor semantic segmentation using depth information. arXiv."},{"key":"ref_107","first-page":"I","article-title":"Visual odometry","volume":"Volume 1","author":"Naroditsky","year":"2004","journal-title":"Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition"},{"key":"ref_108","doi-asserted-by":"crossref","unstructured":"Goldman, M., Hassner, T., and Avidan, S. (2019, January 16\u201317). Learn stereo, infer mono: Siamese networks for self-supervised, monocular, depth estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00348"},{"key":"ref_109","doi-asserted-by":"crossref","first-page":"e865","DOI":"10.7717\/peerj-cs.865","article-title":"Self-supervised recurrent depth estimation with attention mechanisms","volume":"8","author":"Makarov","year":"2022","journal-title":"PeerJ Comput. Sci."},{"key":"ref_110","doi-asserted-by":"crossref","first-page":"045031","DOI":"10.1088\/1361-6560\/abd955","article-title":"Stereoscopic portable hybrid gamma imaging for source depth estimation","volume":"66","author":"Bugby","year":"2021","journal-title":"Phys. Med. Biol."},{"key":"ref_111","doi-asserted-by":"crossref","unstructured":"Praveen, S. (2020). Efficient depth estimation using sparse stereo-vision with other perception techniques. Coding Theory, 111.","DOI":"10.5772\/intechopen.86303"},{"key":"ref_112","unstructured":"Mandelbaum, R., Kamberova, G., and Mintz, M. (1998, January 7). Stereo depth estimation: A confidence interval approach. Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), Bombay, India."},{"key":"ref_113","doi-asserted-by":"crossref","unstructured":"Poggi, M., Aleotti, F., Tosi, F., and Mattoccia, S. (2018, January 1\u20135). Towards real-time unsupervised monocular depth estimation on cpu. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593814"},{"key":"ref_114","doi-asserted-by":"crossref","unstructured":"Cunningham, P., Cord, M., and Delany, S.J. (2008). Supervised learning. Machine Learning Techniques for Multimedia, Springer.","DOI":"10.1007\/978-3-540-75171-7_2"},{"key":"ref_115","doi-asserted-by":"crossref","first-page":"1438","DOI":"10.1109\/TMI.2019.2950936","article-title":"Dense depth estimation in monocular endoscopy with self-supervised learning methods","volume":"39","author":"Liu","year":"2019","journal-title":"IEEE Trans. Med. Imaging"},{"key":"ref_116","doi-asserted-by":"crossref","unstructured":"Abdulwahab, S., Rashwan, H.A., Masoumian, A., Sharaf, N., and Puig, D. (2021, January 14). Promising Depth Map Prediction Method from a Single Image Based on Conditional Generative Adversarial Network. Proceedings of the 23rd International Conference of the Catalan Association for Artificial Intelligence (CCIA), Tarragona, Spain.","DOI":"10.3233\/FAIA210159"},{"key":"ref_117","unstructured":"Li, B., Shen, C., Dai, Y., Van Den Hengel, A., and He, M. (2015, January 7\u201312). Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_118","unstructured":"Dos Santos Rosa, N., Guizilini, V., and Grassi, V. (2019, January 2\u20136). Sparse-to-continuous: Enhancing monocular depth estimation using occupancy maps. Proceedings of the 2019 19th International Conference on Advanced Robotics (ICAR), Belo Horizonte, Brazil."},{"key":"ref_119","unstructured":"Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., and Koltun, V. (2019). Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv."},{"key":"ref_120","doi-asserted-by":"crossref","unstructured":"Sheng, F., Xue, F., Chang, Y., Liang, W., and Ming, A. (2022). Monocular Depth Distribution Alignment with Low Computation. arXiv.","DOI":"10.1109\/ICRA46639.2022.9811937"},{"key":"ref_121","doi-asserted-by":"crossref","first-page":"543","DOI":"10.1007\/s00034-019-01173-3","article-title":"Unsupervised learning-based depth estimation-aided visual slam approach","volume":"39","author":"Geng","year":"2020","journal-title":"Circuits Syst. Signal Process."},{"key":"ref_122","doi-asserted-by":"crossref","unstructured":"Lu, Y., and Lu, G. (2019, January 22\u201325). Deep unsupervised learning for simultaneous visual odometry and depth estimation. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803247"},{"key":"ref_123","doi-asserted-by":"crossref","unstructured":"Pilzer, A., Lathuiliere, S., Sebe, N., and Ricci, E. (2019, January 15\u201320). Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01000"},{"key":"ref_124","unstructured":"Cho, J., Min, D., Kim, Y., and Sohn, K. (2019). A large RGB-D dataset for semi-supervised monocular depth estimation. arXiv."},{"key":"ref_125","doi-asserted-by":"crossref","unstructured":"Hoiem, D., Efros, A.A., and Hebert, M. (2005). Automatic photo pop-up. ACM Digital Library SIGGRAPH 2005 Papers, Association for Computing Machinery.","DOI":"10.1145\/1186822.1073232"},{"key":"ref_126","doi-asserted-by":"crossref","unstructured":"Masoumian, A., Marei, D.G., Abdulwahab, S., Cristiano, J., Puig, D., and Rashwan, H.A. (2021, January 14). Absolute distance prediction based on deep learning object detection and monocular depth estimation models. Proceedings of the 23rd International Conference of the Catalan Association for Artificial Intelligence (CCIA), Tarragona, Spain.","DOI":"10.3233\/FAIA210151"},{"key":"ref_127","unstructured":"Dijk, T.v., and Croon, G.d. (November, January 27). How do neural networks see depth in single images?. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_128","doi-asserted-by":"crossref","unstructured":"Mousavian, A., Pirsiavash, H., and Ko\u0161eck\u00e1, J. (2016, January 25\u201328). Joint semantic segmentation and depth estimation with deep convolutional networks. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.69"},{"key":"ref_129","doi-asserted-by":"crossref","unstructured":"Jung, H., Kim, Y., Min, D., Oh, C., and Sohn, K. (2017, January 17\u201320). Depth prediction from a single image with conditional adversarial networks. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8296575"},{"key":"ref_130","doi-asserted-by":"crossref","unstructured":"Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., and Bry, A. (2017, January 22\u201329). End-to-end learning of geometry and context for deep stereo regression. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.17"},{"key":"ref_131","doi-asserted-by":"crossref","unstructured":"Facil, J.M., Ummenhofer, B., Zhou, H., Montesano, L., Brox, T., and Civera, J. (2019, January 15\u201320). CAM-Convs: Camera-aware multi-scale convolutions for single-view depth. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01210"},{"key":"ref_132","doi-asserted-by":"crossref","unstructured":"Wofk, D., Ma, F., Yang, T.J., Karaman, S., and Sze, V. (2019, January 20\u201324). Fastdepth: Fast monocular depth estimation on embedded systems. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794182"},{"key":"ref_133","first-page":"730","article-title":"Single-image depth perception in the wild","volume":"29","author":"Chen","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_134","doi-asserted-by":"crossref","first-page":"41337","DOI":"10.1109\/ACCESS.2018.2857703","article-title":"Wearable depth camera: Monocular depth estimation via sparse optimization under weak supervision","volume":"6","author":"He","year":"2018","journal-title":"IEEE Access"},{"key":"ref_135","doi-asserted-by":"crossref","first-page":"1661","DOI":"10.1109\/LRA.2019.2896963","article-title":"Geo-supervised visual depth prediction","volume":"4","author":"Fei","year":"2019","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_136","doi-asserted-by":"crossref","unstructured":"Li, R., Wang, S., Long, Z., and Gu, D. (2018, January 21\u201325). Undeepvo: Monocular visual odometry through unsupervised deep learning. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia.","DOI":"10.1109\/ICRA.2018.8461251"},{"key":"ref_137","unstructured":"Wu, Z., Wu, X., Zhang, X., Wang, S., and Ju, L. (November, January 27). Spatial correspondence with generative adversarial network: Learning depth from monocular videos. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_138","doi-asserted-by":"crossref","unstructured":"Wang, Y., Wang, P., Yang, Z., Luo, C., Yang, Y., and Xu, W. (2019, January 15\u201320). Unos: Unified unsupervised optical-flow and stereo-depth estimation by watching videos. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00826"},{"key":"ref_139","unstructured":"Chen, Y., Schmid, C., and Sminchisescu, C. (November, January 27). Self-supervised learning with geometric constraints in monocular video: Connecting flow, depth, and camera. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_140","unstructured":"Li, S., Xue, F., Wang, X., Yan, Z., and Zha, H. (November, January 27). Sequential adversarial learning for self-supervised deep visual odometry. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_141","doi-asserted-by":"crossref","unstructured":"Almalioglu, Y., Saputra, M.R.U., de Gusmao, P.P., Markham, A., and Trigoni, N. (2019, January 20\u201324). Ganvo: Unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8793512"},{"key":"ref_142","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_143","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_144","doi-asserted-by":"crossref","unstructured":"Hu, J., Ozay, M., Zhang, Y., and Okatani, T. (2019, January 7\u201311). Revisiting single image depth estimation: Toward higher resolution maps with accurate object boundaries. Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2019.00116"},{"key":"ref_145","doi-asserted-by":"crossref","unstructured":"Chen, X., Chen, X., and Zha, Z.J. (2019). Structure-aware residual pyramid network for monocular depth estimation. arXiv.","DOI":"10.24963\/ijcai.2019\/98"},{"key":"ref_146","doi-asserted-by":"crossref","unstructured":"Nekrasov, V., Dharmasiri, T., Spek, A., Drummond, T., Shen, C., and Reid, I. (2019, January 20\u201324). Real-time joint semantic segmentation and depth estimation using asymmetric annotations. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794220"},{"key":"ref_147","unstructured":"Hu, J., Fan, C., Jiang, H., Guo, X., Gao, Y., Lu, X., and Lam, T.L. (2021). Boosting Light-Weight Depth Estimation Via Knowledge Distillation. arXiv."},{"key":"ref_148","unstructured":"Zhou, H., Greenwood, D., and Taylor, S. (2021). Self-Supervised Monocular Depth Estimation with Internal Feature Fusion. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/14\/5353\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:52:53Z","timestamp":1760140373000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/14\/5353"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,18]]},"references-count":148,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["s22145353"],"URL":"https:\/\/doi.org\/10.3390\/s22145353","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,18]]}}}