{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T00:39:23Z","timestamp":1760229563795,"version":"build-2065373602"},"reference-count":68,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2022,6,17]],"date-time":"2022-06-17T00:00:00Z","timestamp":1655424000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32071680","182102110160","201701013","2019B020223003","2021ZY72"],"award-info":[{"award-number":["32071680","182102110160","201701013","2019B020223003","2021ZY72"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Science and Technology Department of Henan Province","award":["32071680","182102110160","201701013","2019B020223003","2021ZY72"],"award-info":[{"award-number":["32071680","182102110160","201701013","2019B020223003","2021ZY72"]}]},{"name":"Young Teachers Found of Xinyang Agriculture and Forestry University","award":["32071680","182102110160","201701013","2019B020223003","2021ZY72"],"award-info":[{"award-number":["32071680","182102110160","201701013","2019B020223003","2021ZY72"]}]},{"name":"Key-Area research and development program of Guangdong province","award":["32071680","182102110160","201701013","2019B020223003","2021ZY72"],"award-info":[{"award-number":["32071680","182102110160","201701013","2019B020223003","2021ZY72"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["32071680","182102110160","201701013","2019B020223003","2021ZY72"],"award-info":[{"award-number":["32071680","182102110160","201701013","2019B020223003","2021ZY72"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Monocular depth estimation is a fundamental yet challenging task in computer vision as depth information will be lost when 3D scenes are mapped to 2D images. Although deep learning-based methods have led to considerable improvements for this task in a single image, most existing approaches still fail to overcome this limitation. Supervised learning methods model depth estimation as a regression problem and, as a result, require large amounts of ground truth depth data for training in actual scenarios. Unsupervised learning methods treat depth estimation as the synthesis of a new disparity map, which means that rectified stereo image pairs need to be used as the training dataset. Aiming to solve such problem, we present an encoder-decoder based framework, which infers depth maps from monocular video snippets in an unsupervised manner. First, we design an unsupervised learning scheme for the monocular depth estimation task based on the basic principles of structure from motion (SfM) and it only uses adjacent video clips rather than paired training data as supervision. Second, our method predicts two confidence masks to improve the robustness of the depth estimation model to avoid the occlusion problem. Finally, we leverage the largest scale and minimum depth loss instead of the multiscale and average loss to improve the accuracy of depth estimation. The experimental results on the benchmark KITTI dataset for depth estimation show that our method outperforms competing unsupervised methods.<\/jats:p>","DOI":"10.3390\/rs14122906","type":"journal-article","created":{"date-parts":[[2022,6,17]],"date-time":"2022-06-17T11:45:44Z","timestamp":1655466344000},"page":"2906","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Encoder-Decoder Structure with Multiscale Receptive Field Block for Unsupervised Depth Estimation from Monocular Video"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0314-1194","authenticated-orcid":false,"given":"Songnan","family":"Chen","sequence":"first","affiliation":[{"name":"School of Mathematics and Computer Science, Wuhan Polytechnic University, No. 36 Huanhu Middle Road, Dongxihu District, Wuhan 430048, China"}]},{"given":"Junyu","family":"Han","sequence":"additional","affiliation":[{"name":"School of Technology, Beijing Forestry University, No. 35 Qinghua East Road, Haidian District, Beijing 100083, China"}]},{"given":"Mengxia","family":"Tang","sequence":"additional","affiliation":[{"name":"School of Technology, Beijing Forestry University, No. 35 Qinghua East Road, Haidian District, Beijing 100083, China"}]},{"given":"Ruifang","family":"Dong","sequence":"additional","affiliation":[{"name":"School of Technology, Beijing Forestry University, No. 35 Qinghua East Road, Haidian District, Beijing 100083, China"}]},{"given":"Jiangming","family":"Kan","sequence":"additional","affiliation":[{"name":"School of Technology, Beijing Forestry University, No. 35 Qinghua East Road, Haidian District, Beijing 100083, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Tulsiani, S., Gupta, S., Fouhey, D., Efros, A.A., and Malik, J. (2018, January 18\u201323). Factoring Shape, Pose, and Layout from the 2D image of a 3D scene. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00039"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Gupta, S., Arbelaez, P., Girshick, R., and Malik, J. (2015, January 7\u201312). Aligning 3D models to RGB-D images of cluttered scenes. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299105"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Xu, B., and Chen, Z. (2018, January 18\u201323). Multi-level Fusion Based 3D Object Detection from Monocular Images. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00249"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wang, Y., Chao, W., Garg, D., Hariharan, B., Campbell, M., and Weinberger, K.Q. (2019, January 15\u201320). Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00864"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"5558","DOI":"10.1109\/LRA.2020.3007457","article-title":"Real-time Fusion Network for RGB-D Semantic Segmentation Incorporating Unexpected Obstacle Detection for Road-driving Images","volume":"5","author":"Sun","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Hu, X., Yang, K., Fei, L., and Wang, K. (2019, January 22\u201325). ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, China.","DOI":"10.1109\/ICIP.2019.8803025"},{"key":"ref_7","unstructured":"Deng, L.Y., Yang, M., Li, T.Y., He, Y.S., and Wang, C.X. (2019). RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ma, F.C., and Karaman, S. (2018, January 21\u201325). Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8460184"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"2024","DOI":"10.1109\/TPAMI.2015.2505283","article-title":"Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields","volume":"38","author":"Liu","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Gupta, A., Efros, A.A., and Hebert, M. (2010, January 5\u201311). Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics. Proceedings of the European Conference on Computer Vision (ECCV), Hersonissos, Greece.","DOI":"10.1007\/978-3-642-15561-1_35"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Hedau, V., Hoiem, D., and Forsyth, D. (2010, January 5\u201311). Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry. Proceedings of the European Conference on Computer Vision (ECCV), Hersonissos, Greece.","DOI":"10.1007\/978-3-642-15567-3_17"},{"key":"ref_12","unstructured":"Lee, D.C., Gupta, A., Hebert, M., and Kanade, T. (2010, January 4\u20137). Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces. Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Vancouver, BC, Canada."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Schwing, A.G., and Urtasun, R. (2012, January 7\u201313). Efficient Exact Inference for 3D Indoor Scene Understanding. Proceedings of the European Conference on Computer Vision (ECCV), Florence, Italy.","DOI":"10.1109\/CVPR.2012.6248006"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Liu, B., Gould, S., and Koller, D. (2010, January 13\u201318). Single image depth estimation from predicted semantic labels. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539823"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Russell, B.C., and Torralba, A. (2009, January 20\u201325). Building a database of 3D scenes from user annotations. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA.","DOI":"10.1109\/CVPRW.2009.5206643"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wu, C., Frahm, J., and Pollefeys, M. (2011, January 20\u201325). Repetition-based Dense Single-View Reconstruction. Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995551"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2144","DOI":"10.1109\/TPAMI.2014.2316835","article-title":"Depth Transfer: Depth Extraction from Video Using Non-parametric Sampling","volume":"36","author":"Karsch","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","first-page":"12","article-title":"Automatic 2D-to-3D image conversion using 3D examples from the internet","volume":"8288","author":"Konrad","year":"2012","journal-title":"Proc. SPIE Int. Soc. Opt. Eng."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Konrad, J., Wang, M., and Ishwar, P. (2012, January 16\u201321). 2D-to-3D image conversion by learning depth from examples. Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA.","DOI":"10.1109\/CVPRW.2012.6238903"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, M., Salzmann, M., and He, X. (2014, January 23\u201328). Discrete-Continuous Depth Estimation from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.97"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yamaguchi, K., Mcallester, D., and Urtasun, R. (2014, January 4\u201313). Efficient Joint Segmentation, Occlusion Labeling, Stereo and Flow Estimation. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_49"},{"key":"ref_22","unstructured":"Bleyer, M., Rhemann, C., and Rother, C. (September, January 29). PatchMatch Stereo\u2014Stereo Matching with Slanted Support Windows. Proceedings of the British Machine Vision Conference (BMVC), Dundee, UK."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1023\/A:1014573219977","article-title":"A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms","volume":"47","author":"Scharstein","year":"2002","journal-title":"Int. J. Comput. Vis."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1109\/TCSVT.2015.2513663","article-title":"Cross-Scale Cost Aggregation for Stereo Matching","volume":"27","author":"Zhang","year":"2017","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_25","unstructured":"Yang, Q.X. (2012, January 16\u201321). A non-local cost aggregation method for stereo matching. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Heise, P., Klose, S., Jensen, B., and Knoll, A. (2013, January 1\u20138). PM-Huber: PatchMatch with Huber Regularization for Stereo Matching. Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV), Sydney, Australia.","DOI":"10.1109\/ICCV.2013.293"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1007\/s11263-007-0107-3","article-title":"Modeling the world from internet photo collections","volume":"80","author":"Snavely","year":"2008","journal-title":"Int. J. Comput. Vis."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Newcombe, R.A., Lovegrove, S.J., and Davison, A.J. (2011, January 6\u201313). DTAM: Dense tracking and mapping in real-time. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126513"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Schonberger, J.L., and Frahm, J.M. (2016, January 27\u201330). Structure-from-Motion Revisited. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.445"},{"key":"ref_30","unstructured":"Eigen, D., Puhrsch, C., and Fergus, R. (2014, January 7\u201314). Depth map prediction from a single image using a multi-scale deep network. Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Xu, D., Wang, W., Tang, H., Liu, H., Sebe., N., and Ricci, E. (2018, January 18\u201323). Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00412"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Chen, X.T., Chen, X.J., and Zha, Z.J. (2019, January 10\u201316). Structure-Aware Residual Pyramid Network for Monocular Depth Estimation. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Macao, China.","DOI":"10.24963\/ijcai.2019\/98"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Mayer, N., Ilg, E., Husser, P., Fischer, P., Cremers, D., Dosovitskiy, A., and Brox, T. (2016, January 27\u201330). A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.438"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kundu, J.N., Uppala, P.K., Pahuja, A., and Babu, R.V. (2018, January 18\u201323). AdaDepth: Unsupervised content congruent adaptation for depth estimation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00281"},{"key":"ref_35","unstructured":"Chen, W., Fu, Z., Yang, D., and Deng, J. (2016, January 5\u201310). Single-Image Depth Perception in the Wild. Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, Z., and Snavely, N. (2018, January 18\u201323). MegaDepth: Learning Single-View Depth Prediction from Internet Photos. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00218"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Li, Z., Dekel, T., Cole, F., Tucker, R., Snavely, N., Liu, C., and Freeman, W.T. (2019, January 15\u201320). Learning the Depths of Moving People by Watching Frozen People. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00465"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Garg, R., BGV, K., and Reid, I. (2016, January 8\u201316). Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_45"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Godard, C., Mac Aodha, O., and Brostow, G.J. (2017, January 21\u201326). Unsupervised Monocular Depth Estimation with Left-Right Consistency. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.699"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Pilzer, A., Xu, D., Puscas, M., Ricci, E., and Sebe, N. (2018, January 5\u20138). Unsupervised Adversarial Depth Estimation using Cycled Generative Networks. Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy.","DOI":"10.1109\/3DV.2018.00073"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1145\/1531326.1531330","article-title":"Patchmatch: A randomized correspondence algorithm for structural image editing","volume":"28","author":"Barnes","year":"2009","journal-title":"ACM Trans. Graph. (SIGGRAPH)"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhan, H., Garg, R., Weerasekera, C.S., Li, K., Agarwal, H., and Reid, I. (2018, January 18\u201323). Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00043"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhou, T.H., Brown, M., Snavely, N., and Lowe, D.G. (2017, January 21\u201326). Unsupervised Learning of Depth and Ego-Motion from Video. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.700"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Wang, C., Buenaposada, J.M., Zhu, R., and Lucey, S. (2018, January 18\u201323). Learning Depth from Monocular Videos using Direct Methods. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00216"},{"key":"ref_45","unstructured":"Bian, J.W., Li, Z., Wang, N., Zhan, H., Shen, C., Cheng, M.M., and Reid, I. (2019, January 8\u201314). Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video. Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Vancouver, BC, Canada."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Yang, Z., Wang, P., Xu, W., Zhao, L., and Nevatia, R. (2017). Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv.","DOI":"10.1609\/aaai.v32i1.12257"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Mahjourian, R., Wicke, M., and Angelova, A. (2018, January 18\u201323). Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00594"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Zou, Y., Luo, Z., and Huang, J. (2018, January 8\u201314). DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. Proceedings of the European Conference on Computer Vision(ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_3"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Shen, T., Luo, Z., Lei, Z., Deng, H., and Long, Q. (2019, January 20\u201324). Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8793479"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Casser, V., Pirk, S., Mahjourian, R., and Angelova, A. (2019, January 15\u201320). Unsupervised monocular depth and egomotion learning with structure and semantics. Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00051"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Klodt, M., and Vedaldi, A. (2018, January 8\u201314). Supervising the new with the old: Learning SFM from SFM. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_43"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Yin, Z., and Shi, J. (2018, January 18\u201323). GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00212"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., and Black, M.J. (2019, January 15\u201320). Competitive Collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01252"},{"key":"ref_54","unstructured":"Liu, L.Y., Jiang, H.M., He, P.C., Chen, W.Z., Liu, X.D., Gao, J.F., and Han, J.W. (2019). On the variance of the adaptive learning rate and beyond. arxiv."},{"key":"ref_55","unstructured":"Zhang, M.R., Lucas, J., Hinton, G., and Ba, J. (2019, January 8\u201314). Lookahead Optimizer: K steps forward, 1 step back. Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Vancouver, BC, Cananda."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving?. the kitti vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"106804","DOI":"10.1016\/j.asoc.2020.106804","article-title":"Monocular Image Depth Prediction without Depth Sensors: An Unsupervised Learning Method","volume":"97","author":"Chen","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_58","unstructured":"Gao, H., Yu, S., Zhuang, L., Sedra, D., and Weinberger, K. (2016). Deep Networks with Stochastic Depth, Springer International Publishing."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_60","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., and Rabinovich, A. (2015, January 7\u201312). Going Deeper with Convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Wang, Q., Wu, B., Zhu, P., Li, P., and Hu, Q. (2020, January 13\u201319). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_63","unstructured":"Kingma, D., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_64","unstructured":"Djork-Arn\u00e9, C., Unterthiner, T., and Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arxiv."},{"key":"ref_65","unstructured":"Nair, V., and Hinton, G.E. (2010, January 21\u201324). Rectified Linear Units improve Restricted Boltzmann Machines vinod Nair. Proceedings of the International Conference on Machine Learning (ICML), Haifa, Israel."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Zhou","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_67","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., and Chintala, S. (2019, January 8\u201314). PyTorch: An imperative style, high-performance deep learning library. Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Vancouver, BC, Canada."},{"key":"ref_68","unstructured":"Jia, D., Wei, D., Socher, R., Li, L.J., Kai, L., and Li, F.F. (2009, January 20\u201325). ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/12\/2906\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:34:17Z","timestamp":1760139257000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/12\/2906"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,17]]},"references-count":68,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["rs14122906"],"URL":"https:\/\/doi.org\/10.3390\/rs14122906","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2022,6,17]]}}}