{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:07:02Z","timestamp":1775228822273,"version":"3.50.1"},"reference-count":69,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T00:00:00Z","timestamp":1769817600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T00:00:00Z","timestamp":1775174400000},"content-version":"vor","delay-in-days":62,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Jiangsu Provincial Key Laboratory of Power Transmission and Distribution Equipment Technology","award":["2025JSSPD02"],"award-info":[{"award-number":["2025JSSPD02"]}]},{"name":"Changzhou Leading Innovative Talent Introduction and Cultivation Project","award":["CQ20250054"],"award-info":[{"award-number":["CQ20250054"]}]},{"name":"the Natural Science Foundation of Jiangsu Province","award":["No. BK20251087"],"award-info":[{"award-number":["No. BK20251087"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J. King Saud Univ. Comput. Inf. Sci."],"published-print":{"date-parts":[[2026,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Monocular depth estimation is widely used for navigation and scene understanding, yet deployment requires balancing accuracy with predictable runtime and compact models. Many recent lightweight designs pair depthwise separable convolutions with transformer components to boost accuracy, which typically introduces a more diverse operator set and can make realized throughput more dependent on the deployment backend in practice. Instead, we revisit recurrent refinement from a deployment-oriented perspective and introduce R-TAFM, purely convolutional framework that performs iterative depth refinement at a fixed working resolution with a parameter-shared decoder. We further derive a deployment-mode variant, R-TAFM-Fast, which is trained with recurrent supervision yet reduces inference to a single decoder pass, lowering latency on commodity GPU and on a Jetson-class embedded GPU. For self-supervised learning, we introduce an adaptive reprojection objective that jointly handles occlusions and independently moving objects without auxiliary tasks, and a neighborhood-consistent correction of auto-masked stationary pixels to prevent supervision collapse in homogeneous regions. Both quantitative benchmarks and qualitative assessments demonstrate that, with\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$\\approx $$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mo>\u2248<\/mml:mo>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    3.2M parameters, R-TAFM achieves accuracy comparable to or exceeding recent lightweight state-of-the-art methods, using only standard convolutions.\n                  <\/jats:p>","DOI":"10.1007\/s44443-026-00477-0","type":"journal-article","created":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T07:52:39Z","timestamp":1769845959000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["R-TAFM: purely convolutional recurrent refinement for deployment-oriented monocular depth estimation"],"prefix":"10.1007","volume":"38","author":[{"given":"Zhongkai","family":"Zhou","sequence":"first","affiliation":[]},{"given":"Xinnan","family":"Fan","sequence":"additional","affiliation":[]},{"given":"Pengfei","family":"Shi","sequence":"additional","affiliation":[]},{"given":"Yuanxue","family":"Xin","sequence":"additional","affiliation":[]},{"given":"Congxuan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yu","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,1,31]]},"reference":[{"key":"477_CR1","first-page":"20014","volume":"34","author":"A Ali","year":"2021","unstructured":"Ali A, Touvron H, Caron M, Bojanowski P, Douze M, Joulin A, Laptev I, Neverova N, Synnaeve G, Verbeek J et al (2021) Xcit: cross-covariance image transformers. Adv Neural Inform Process Syst 34:20014\u201320027","journal-title":"Adv Neural Inform Process Syst"},{"key":"477_CR2","doi-asserted-by":"publisher","unstructured":"Bhat SF, Alhashim I, Wonka P (2020) AdaBins: depth estimation using adaptive bins. Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pp 4008\u20134017. https:\/\/doi.org\/10.1109\/CVPR46437.2021.00400","DOI":"10.1109\/CVPR46437.2021.00400"},{"issue":"9","key":"477_CR3","doi-asserted-by":"publisher","first-page":"2548","DOI":"10.1007\/s11263-021-01484-6","volume":"129","author":"JW Bian","year":"2021","unstructured":"Bian JW, Zhan H, Wang N, Li Z, Zhang L, Shen C, Cheng MM, Reid I (2021) Unsupervised scale-consistent depth learning from video. Int J Comput Vis 129(9):2548\u20132564","journal-title":"Int J Comput Vis"},{"key":"477_CR4","doi-asserted-by":"crossref","unstructured":"Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1251\u20131258","DOI":"10.1109\/CVPR.2017.195"},{"key":"477_CR5","doi-asserted-by":"crossref","unstructured":"Cho K, Van Merri\u00ebnboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv:1409.1259","DOI":"10.3115\/v1\/W14-4012"},{"key":"477_CR6","doi-asserted-by":"crossref","unstructured":"Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3213\u20133223","DOI":"10.1109\/CVPR.2016.350"},{"key":"477_CR7","doi-asserted-by":"crossref","unstructured":"Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) RepVGG: making VGG-style ConvNets great again. arXiv:2101.03697","DOI":"10.1109\/CVPR46437.2021.01352"},{"issue":"10","key":"477_CR8","doi-asserted-by":"publisher","first-page":"16940","DOI":"10.1109\/TITS.2022.3160741","volume":"23","author":"X Dong","year":"2022","unstructured":"Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) Towards real-time monocular depth estimation for robotics: a survey. IEEE Trans Intell Transp Syst 23(10):16940\u201316961","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"477_CR9","doi-asserted-by":"crossref","unstructured":"Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision. pp 2650\u20132658","DOI":"10.1109\/ICCV.2015.304"},{"key":"477_CR10","doi-asserted-by":"crossref","unstructured":"Elhassan MAM, Zhou C, Khan A, Benabid A, Adam ABM, Mehmood A, Wambugu N (2024) Real-time semantic segmentation for autonomous driving: a review of CNNs, transformers, and beyond. J King Saud Univ Comput Inf Sci 36(10):102226","DOI":"10.1016\/j.jksuci.2024.102226"},{"key":"477_CR11","doi-asserted-by":"publisher","first-page":"1609","DOI":"10.1109\/LSP.2022.3189597","volume":"29","author":"X Fan","year":"2022","unstructured":"Fan X, Zhou Z, Shi P, Xin Y, Zhou X (2022) RAFM: recurrent atrous feature modulation for accurate monocular depth estimating. IEEE Signal Process Lett 29:1609\u20131613","journal-title":"IEEE Signal Process Lett"},{"key":"477_CR12","doi-asserted-by":"crossref","unstructured":"Fang J, Chen X, Zhao J, Zeng K (2024) A scalable attention network for lightweight image super-resolution. J King Saud Univ Comput Inf Sci 36(8):102185","DOI":"10.1016\/j.jksuci.2024.102185"},{"key":"477_CR13","doi-asserted-by":"publisher","unstructured":"Fan Z, Li G, Zhou Z (2025) R-FGDepth: towards foundation models for recurrent depth learning with frequency-guided initialization and refinement. In: Pattern recognition. Elsevier, p 112843. https:\/\/doi.org\/10.1016\/j.patcog.2025.112843","DOI":"10.1016\/j.patcog.2025.112843"},{"key":"477_CR14","doi-asserted-by":"crossref","unstructured":"Garg R, Bg VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: Geometry to the rescue. In: European conference on computer vision. Springer, pp 740\u2013756","DOI":"10.1007\/978-3-319-46484-8_45"},{"issue":"11","key":"477_CR15","doi-asserted-by":"publisher","first-page":"1231","DOI":"10.1177\/0278364913491297","volume":"32","author":"A Geiger","year":"2013","unstructured":"Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the KITTI dataset. Int J Robot Res 32(11):1231\u20131237","journal-title":"Int J Robot Res"},{"key":"477_CR16","doi-asserted-by":"crossref","unstructured":"Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 1440\u20131448","DOI":"10.1109\/ICCV.2015.169"},{"key":"477_CR17","doi-asserted-by":"crossref","unstructured":"Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 270\u2013279","DOI":"10.1109\/CVPR.2017.699"},{"key":"477_CR18","doi-asserted-by":"crossref","unstructured":"Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 3828\u20133838","DOI":"10.1109\/ICCV.2019.00393"},{"key":"477_CR19","doi-asserted-by":"crossref","unstructured":"Guizilini V, Ambrus R, Pillai S, Raventos A, Gaidon A (2020) 3d packing for self-supervised monocular depth estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 2485\u20132494","DOI":"10.1109\/CVPR42600.2020.00256"},{"issue":"5","key":"477_CR20","doi-asserted-by":"publisher","first-page":"1663","DOI":"10.1016\/j.jksuci.2020.08.011","volume":"34","author":"MS Hamid","year":"2022","unstructured":"Hamid MS, Manap NFA, Hamzah RA, Kadmin AF (2022) Stereo matching algorithm based on deep learning: a survey. J King Saud Univ Comput Inf Sci 34(5):1663\u20131673","journal-title":"J King Saud Univ Comput Inf Sci"},{"key":"477_CR21","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"477_CR22","unstructured":"Hinton G, Vinyals O, Dean J, others (2015) Distilling the knowledge in a neural network. arXiv:1503.02531"},{"issue":"2","key":"477_CR23","doi-asserted-by":"publisher","first-page":"328","DOI":"10.1109\/TPAMI.2007.1166","volume":"30","author":"H Hirschmuller","year":"2007","unstructured":"Hirschmuller H (2007) Stereo processing by semiglobal matching and mutual information. IEEE Trans Pattern Anal Mach Intell 30(2):328\u2013341","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"477_CR24","doi-asserted-by":"crossref","unstructured":"Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, others (2019) Searching for mobilenetv3. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 1314\u20131324","DOI":"10.1109\/ICCV.2019.00140"},{"key":"477_CR25","doi-asserted-by":"crossref","unstructured":"Hui TW, Tang X, Loy CC (2018) LiteFlowNet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of IEEE conference on Computer Vision and Pattern Recognition (CVPR). pp 8981\u20138989","DOI":"10.1109\/CVPR.2018.00936"},{"key":"477_CR26","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-Excitation Networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"issue":"2","key":"477_CR27","first-page":"1502","volume":"24","author":"S Jia","year":"2022","unstructured":"Jia S, Pei X, Yao W, Wong SC (2022) Self-supervised depth estimation leveraging global perception and geometric smoothness. IEEE Trans Intell Transp Syst 24(2):1502\u20131517","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"477_CR28","doi-asserted-by":"crossref","unstructured":"Johnston A, Carneiro G (2020) Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 4756\u20134765","DOI":"10.1109\/CVPR42600.2020.00481"},{"key":"477_CR29","doi-asserted-by":"crossref","unstructured":"Klingner M, Term\u00f6hlen JA, Mikolajczyk J, Fingscheidt T (2020) Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In: European conference on computer vision. Springer, pp 582\u2013600","DOI":"10.1007\/978-3-030-58565-5_35"},{"key":"477_CR30","unstructured":"Lee JH, Han MK, Ko DW, Suh IH (2019) From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326"},{"key":"477_CR31","unstructured":"Li H, Gordon A, Zhao H, Casser V, Angelova A (2020) Unsupervised monocular depth learning in dynamic scenes. arXiv:2010.16404"},{"key":"477_CR32","doi-asserted-by":"crossref","unstructured":"Li X, Li X, Zhang S, Zhang G, Zhang M, Shang H (2023) SLViT: shuffle-convolution-based lightweight Vision transformer for effective diagnosis of sugarcane leaf diseases. J King Saud Univ Comput Inf Sci 35(6):101401","DOI":"10.1016\/j.jksuci.2022.09.013"},{"issue":"3","key":"477_CR33","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1007\/s44443-025-00023-4","volume":"37","author":"H Lin","year":"2025","unstructured":"Lin H, Xu S, Su C (2025) MSTFormer: multi-granularity spatial-temporal transformers for 3D human pose estimation. J King Saud Univ Comput Inf Sci 37(3):15","journal-title":"J King Saud Univ Comput Inf Sci"},{"key":"477_CR34","unstructured":"Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv:1711.05101"},{"issue":"10","key":"477_CR35","doi-asserted-by":"publisher","first-page":"2624","DOI":"10.1109\/TPAMI.2019.2930258","volume":"42","author":"C Luo","year":"2019","unstructured":"Luo C, Yang Z, Wang P, Wang Y, Xu W, Nevatia R, Yuille A (2019) Every pixel counts++: joint learning of geometry and motion with 3d holistic understanding. IEEE Trans Pattern Anal Mach Intell 42(10):2624\u20132641","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"477_CR36","doi-asserted-by":"crossref","unstructured":"Lyu X, Liu L, Wang M, Kong X, Liu L, Liu Y, Chen X, Yuan Y (2020) HR-Depth: high resolution self-supervised monocular depth estimation. arXiv:2012.07356","DOI":"10.1609\/aaai.v35i3.16329"},{"key":"477_CR37","doi-asserted-by":"crossref","unstructured":"Ma N, Zhang X, Zheng HT, Sun J (2018) ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV) (pp. 116\u2013131)","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"477_CR38","doi-asserted-by":"crossref","unstructured":"Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE\/CVF winter conference on applications of computer vision. pp 3139\u20133148","DOI":"10.1109\/WACV48630.2021.00318"},{"key":"477_CR39","doi-asserted-by":"publisher","unstructured":"Patil V, Sakaridis C, Liniger A, Van Gool L (2022) P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior. In: IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, pp 1600\u20131611. https:\/\/doi.org\/10.1109\/CVPR52688.2022.00166","DOI":"10.1109\/CVPR52688.2022.00166"},{"key":"477_CR40","doi-asserted-by":"crossref","unstructured":"Piccinelli L, Sakaridis C, Yu F (2023) idisc: Internal discretization for monocular depth estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 21477\u201321487","DOI":"10.1109\/CVPR52729.2023.02057"},{"key":"477_CR41","doi-asserted-by":"crossref","unstructured":"Pilzer A, Xu D, Puscas M, Ricci E, Sebe N (2018) Unsupervised adversarial depth estimation using cycled generative networks. In: 2018 international conference on 3D vision (3DV). IEEE, pp 587\u2013595","DOI":"10.1109\/3DV.2018.00073"},{"key":"477_CR42","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"477_CR43","doi-asserted-by":"crossref","unstructured":"Smith LN, Topin N (2019) Super-convergence: Very fast training of neural networks using large learning rates. In: Artificial intelligence and machine learning for multi-domain operations applications, vol 11006. SPIE, pp 369\u2013386","DOI":"10.1117\/12.2520589"},{"issue":"8","key":"477_CR44","doi-asserted-by":"publisher","first-page":"11654","DOI":"10.1109\/TITS.2021.3106055","volume":"23","author":"Z Song","year":"2021","unstructured":"Song Z, Lu J, Yao Y, Zhang J (2021) Self-supervised depth completion from direct visual-LiDAR odometry in autonomous driving. IEEE Trans Intell Transp Syst 23(8):11654\u201311665","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"477_CR45","doi-asserted-by":"publisher","first-page":"4691","DOI":"10.1109\/TIP.2021.3074306","volume":"30","author":"X Song","year":"2021","unstructured":"Song X, Li W, Zhou D, Dai Y, Fang J, Li H, Zhang L (2021) MLDA-Net: multi-level dual attention-based network for self-supervised monocular depth estimation. IEEE Trans Image Process 30:4691\u20134705","journal-title":"IEEE Trans Image Process"},{"issue":"7","key":"477_CR46","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/s44443-025-00201-4","volume":"37","author":"M Sun","year":"2025","unstructured":"Sun M, Rui T, Liu J, Wang D, Yang C, Zheng N (2025) Cross-modal unsupervised domain adaptation for 3D semantic segmentation via multi-scale fusion-then-distillation. J King Saud Univ Comput Inf Sci 37(7):193","journal-title":"J King Saud Univ Comput Inf Sci"},{"key":"477_CR47","doi-asserted-by":"crossref","unstructured":"Uhrig J, Schneider N, Schneider L, Franke U, Brox T, Geiger A (2017) Sparsity invariant CNNs. In: 2017 international conference on 3D Vision (3DV). IEEE, pp 11\u201320","DOI":"10.1109\/3DV.2017.00012"},{"issue":"4","key":"477_CR48","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1109\/TIP.2003.819861","volume":"13","author":"Z Wang","year":"2004","unstructured":"Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600\u2013612","journal-title":"IEEE Trans Image Process"},{"key":"477_CR49","doi-asserted-by":"publisher","first-page":"4130","DOI":"10.1109\/TIP.2020.2968751","volume":"29","author":"A Wang","year":"2020","unstructured":"Wang A, Fang Z, Gao Y, Tan S, Wang S, Ma S, Hwang JN (2020) Adversarial learning for joint optimization of depth and ego-motion. IEEE Trans Image Process 29:4130\u20134142","journal-title":"IEEE Trans Image Process"},{"issue":"1","key":"477_CR50","doi-asserted-by":"publisher","first-page":"308","DOI":"10.1109\/TITS.2020.3010418","volume":"23","author":"G Wang","year":"2020","unstructured":"Wang G, Zhang C, Wang H, Wang J, Wang Y, Wang X (2020) Unsupervised learning of depth, optical flow and pose with occlusion from 3d geometry. IEEE Trans Intell Transp Syst 23(1):308\u2013320","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"477_CR51","doi-asserted-by":"crossref","unstructured":"Wang C, Miguel Buenaposada J, Zhu R, Lucey S (2018) Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2022\u20132030","DOI":"10.1109\/CVPR.2018.00216"},{"key":"477_CR52","doi-asserted-by":"crossref","unstructured":"Watson J, Firman M, Monszpart A, Brostow GJ (2020) Footprints and free space from a single color image. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 11\u201320","DOI":"10.1109\/CVPR42600.2020.00009"},{"key":"477_CR53","doi-asserted-by":"crossref","unstructured":"Watson J, Mac Aodha O, Prisacariu V, Brostow G, Firman M (2021) The temporal opportunist: self-supervised multi-frame monocular depth. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 1164\u20131174","DOI":"10.1109\/CVPR46437.2021.00122"},{"key":"477_CR54","doi-asserted-by":"crossref","unstructured":"Wei J, Pan S, Gao W, Guo P (2024) LAM-Depth: Laplace-Attention Module-Based Self-Supervised Monocular Depth Estimation. IEEE Trans Intell Transp Syst","DOI":"10.1109\/TITS.2024.3402655"},{"key":"477_CR55","doi-asserted-by":"crossref","unstructured":"Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 3\u201319","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"477_CR56","doi-asserted-by":"publisher","first-page":"8811","DOI":"10.1109\/TIP.2021.3120670","volume":"30","author":"X Xu","year":"2021","unstructured":"Xu X, Chen Z, Yin F (2021) Multi-scale spatial attention-guided monocular depth estimation with semantic enhancement. IEEE Trans Image Process 30:8811\u20138822","journal-title":"IEEE Trans Image Process"},{"issue":"10","key":"477_CR57","doi-asserted-by":"publisher","first-page":"17039","DOI":"10.1109\/TITS.2021.3093592","volume":"23","author":"Y Xu","year":"2022","unstructured":"Xu Y, Wang Y, Huang R, Lei Z, Yang J, Li Z (2022) Unsupervised learning of depth estimation and camera pose with multi-scale GANs. IEEE Trans Intell Transp Syst 23(10):17039\u201317047","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"477_CR58","doi-asserted-by":"publisher","unstructured":"Yang B, Guo Y, Ni R, Liu Y, Li G, Hu C (2025) Asymmetric multimodal guidance fusion network for realtime visible and thermal semantic segmentation. Eng Appl Artif Intell 142:109881. https:\/\/doi.org\/10.1016\/j.engappai.2024.109881","DOI":"10.1016\/j.engappai.2024.109881"},{"key":"477_CR59","doi-asserted-by":"publisher","unstructured":"Yang B, Yang S, Wang P, Wang H, Jiang J, Ni R, Yang C (2024) FRPNet: an improved Faster-ResNet with PASPP for real-time semantic segmentation in the unstructured field scene. Comput Electron Agric 217:108623. https:\/\/doi.org\/10.1016\/j.compag.2024.108623","DOI":"10.1016\/j.compag.2024.108623"},{"key":"477_CR60","doi-asserted-by":"publisher","first-page":"4492","DOI":"10.1109\/TIP.2021.3072215","volume":"30","author":"X Ye","year":"2021","unstructured":"Ye X, Fan X, Zhang M, Xu R, Zhong W (2021) Unsupervised monocular depth estimation via recursive stereo distillation. IEEE Trans Image Process 30:4492\u20134504","journal-title":"IEEE Trans Image Process"},{"key":"477_CR61","doi-asserted-by":"publisher","unstructured":"Yuan W, Gu X, Dai Z, Zhu S, Tan P (2022) Neural Window Fully-connected CRFs for Monocular Depth Estimation. In: IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, pp 3906\u20133915. https:\/\/doi.org\/10.1109\/CVPR52688.2022.00389","DOI":"10.1109\/CVPR52688.2022.00389"},{"key":"477_CR62","doi-asserted-by":"publisher","first-page":"3251","DOI":"10.1109\/TIP.2022.3167307","volume":"31","author":"Y Zhang","year":"2022","unstructured":"Zhang Y, Gong M, Li J, Zhang M, Jiang F, Zhao H (2022) Self-supervised monocular depth estimation with multiscale perception. IEEE Trans Image Process 31:3251\u20133266","journal-title":"IEEE Trans Image Process"},{"key":"477_CR63","doi-asserted-by":"crossref","unstructured":"Zhang N, Nex F, Vosselman G, Kerle N (2023) Lite-mono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. pp 18537\u201318546","DOI":"10.1109\/CVPR52729.2023.01778"},{"key":"477_CR64","doi-asserted-by":"crossref","unstructured":"Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1851\u20131858","DOI":"10.1109\/CVPR.2017.700"},{"key":"477_CR65","doi-asserted-by":"crossref","unstructured":"Zhou Z, Dong Q (2022a) Learning Occlusion-aware Coarse-to-Fine Depth Map for Self-supervised Monocular Depth Estimation. In: Proceedings of the 30th ACM international conference on multimedia. pp 6386\u20136395","DOI":"10.1145\/3503161.3548381"},{"key":"477_CR66","doi-asserted-by":"crossref","unstructured":"Zhou Z, Dong Q (2022b) Self-distilled feature aggregation for self-supervised monocular depth estimation. In: European conference on computer vision. Springer, pp 709\u2013726","DOI":"10.1007\/978-3-031-19769-7_41"},{"key":"477_CR67","doi-asserted-by":"crossref","unstructured":"Zhou Z, Dong Q (2023) Two-in-One Depth: Bridging the Gap Between Monocular and Binocular Self-supervised Depth Estimation. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 9411\u20139421","DOI":"10.1109\/ICCV51070.2023.00863"},{"key":"477_CR68","doi-asserted-by":"crossref","unstructured":"Zhou Z, Fan X, Shi P, Xin Y (2021) R-MSFM: recurrent multi-scale feature modulation for monocular depth estimating. In: Proceedings of the IEEE\/CVF international conference on computer vision. pp 12777\u201312786","DOI":"10.1109\/ICCV48922.2021.01254"},{"key":"477_CR69","doi-asserted-by":"crossref","unstructured":"Zou Y, Luo Z, Huang JB (2018) Df-net: unsupervised joint learning of depth and flow using cross-task consistency. In: Proceedings of the European conference on computer vision (ECCV). pp 36\u201353","DOI":"10.1007\/978-3-030-01228-1_3"}],"container-title":["Journal of King Saud University Computer and Information Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44443-026-00477-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44443-026-00477-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44443-026-00477-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T14:20:52Z","timestamp":1775226052000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44443-026-00477-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,31]]},"references-count":69,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,4]]}},"alternative-id":["477"],"URL":"https:\/\/doi.org\/10.1007\/s44443-026-00477-0","relation":{},"ISSN":["1319-1578","2213-1248"],"issn-type":[{"value":"1319-1578","type":"print"},{"value":"2213-1248","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,31]]},"assertion":[{"value":"29 October 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 January 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 January 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}},{"value":"This article does not contain any studies with human participants or animals performed by any of the authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}}],"article-number":"108"}}