{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T08:08:59Z","timestamp":1768291739412,"version":"3.49.0"},"reference-count":76,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2025,1,16]],"date-time":"2025-01-16T00:00:00Z","timestamp":1736985600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,16]],"date-time":"2025-01-16T00:00:00Z","timestamp":1736985600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001774","name":"University of Sydney","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001774","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2025,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Multi-camera depth estimation has gained significant attention in autonomous driving due to its importance in perceiving complex environments. However, extending monocular self-supervised methods to multi-camera setups introduces unique challenges that existing techniques often fail to address. In this paper, we propose <jats:bold>STViT+<\/jats:bold>, a novel Transformer-based framework for self-supervised multi-camera depth estimation. Our key contributions include: 1) the <jats:bold>Spatial-Temporal Transformer (STTrans)<\/jats:bold>, which integrates local spatial connectivity and global context to capture enriched spatial-temporal cross-view correlations, resulting in more accurate 3D geometry reconstruction; 2) the <jats:bold>Spatial-Temporal Photometric Consistency Correction (STPCC)<\/jats:bold> strategy that mitigates the impact of varying illumination, ensuring brightness consistency across frames during photometric loss calculation; 3) the <jats:bold>Adversarial Geometry Regularization (AGR)<\/jats:bold> module, which employs Generative Adversarial Networks to impose spatial constraints by using unpaired depth maps, enhancing performance under adverse conditions such as rain and nighttime driving. Extensive evaluations on large-scale autonomous driving datasets, including Nuscenes and DDAD, confirm that STViT+ sets a new benchmark for multi-camera depth estimation.<\/jats:p>","DOI":"10.1007\/s10489-024-06191-6","type":"journal-article","created":{"date-parts":[[2025,1,16]],"date-time":"2025-01-16T06:03:39Z","timestamp":1737007419000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["STViT+: improving self-supervised multi-camera depth estimation with spatial-temporal context and adversarial geometry regularization"],"prefix":"10.1007","volume":"55","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2140-8657","authenticated-orcid":false,"given":"Zhuo","family":"Chen","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1139-4183","authenticated-orcid":false,"given":"Haimei","family":"Zhao","sequence":"additional","affiliation":[]},{"given":"Xiaoshuai","family":"Hao","sequence":"additional","affiliation":[]},{"given":"Bo","family":"Yuan","sequence":"additional","affiliation":[]},{"given":"Xiu","family":"Li","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,16]]},"reference":[{"key":"6191_CR1","doi-asserted-by":"crossref","unstructured":"Wang Y, Chao W-L, Garg D, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 8445\u20138453","DOI":"10.1109\/CVPR.2019.00864"},{"issue":"10","key":"6191_CR2","doi-asserted-by":"publisher","first-page":"16940","DOI":"10.1109\/TITS.2022.3160741","volume":"23","author":"X Dong","year":"2022","unstructured":"Dong X, Garratt MA, Anavatti SG, Abbass HA (2022) Towards real-time monocular depth estimation for robotics: A survey. IEEE Trans Intell Transp Syst 23(10):16940\u201316961","journal-title":"IEEE Trans Intell Transp Syst"},{"issue":"1","key":"6191_CR3","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1109\/TITS.2019.2955598","volume":"22","author":"X Yang","year":"2019","unstructured":"Yang X, Chen J, Dang Y, Luo H, Tang Y, Liao C, Chen P, Cheng K-T (2019) Fast depth prediction and obstacle avoidance on a monocular drone using probabilistic convolutional neural network. IEEE Trans Intell Transp Syst 22(1):156\u2013167","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"6191_CR4","doi-asserted-by":"crossref","unstructured":"El\u00a0Jamiy F, Marsh R (2019) Distance estimation in virtual reality and augmented reality: A survey. In: 2019 IEEE International conference on electro information technology (EIT), IEEE, pp 063\u2013068","DOI":"10.1109\/EIT.2019.8834182"},{"key":"6191_CR5","doi-asserted-by":"crossref","unstructured":"Abed A, Akrout B, Amous I (2024) Deep learning-based few-shot person re-identification from top-view rgb and depth images. Neural Comput & Applic pp 1\u201318","DOI":"10.1007\/s00521-024-10239-6"},{"key":"6191_CR6","doi-asserted-by":"crossref","unstructured":"Wei Q, Shan J, Cheng H, Yu Z, Lijuan B, Haimei Z (2016) A method of 3D human-motion capture and reconstruction based on depth information. In: 2016 IEEE International conference on mechatronics and automation. IEEE, pp 187\u2013192","DOI":"10.1109\/ICMA.2016.7558558"},{"key":"6191_CR7","doi-asserted-by":"crossref","unstructured":"Li Y, Ge Z, Yu G, Yang J, Wang Z, Shi Y, Sun J, Li Z (2023) Bevdepth: Acquisition of reliable depth for multi-view 3d object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 1477\u20131485","DOI":"10.1609\/aaai.v37i2.25233"},{"key":"6191_CR8","doi-asserted-by":"crossref","unstructured":"Zhao H, Zhang Q, Zhao S, Chen Z, Zhang J, Tao D (2024) Simdistill: Simulated multi-modal distillation for bev 3d object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 38, pp 7460\u20137468","DOI":"10.1609\/aaai.v38i7.28577"},{"key":"6191_CR9","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2024.104630","volume":"174","author":"M Reda","year":"2024","unstructured":"Reda M, Onsy A, Haikal AY, Ghanbari A (2024) Path planning algorithms in the autonomous driving system: A comprehensive review. Robot Auton Syst 174:104630","journal-title":"Robot Auton Syst"},{"key":"6191_CR10","doi-asserted-by":"crossref","unstructured":"M\u00fcller H, Niculescu V, Polonelli T, Magno M, Benini L (2023) Robust and efficient depth-based obstacle avoidance for autonomous miniaturized uavs. IEEE Trans Robot","DOI":"10.1109\/TRO.2023.3315710"},{"key":"6191_CR11","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097\u20131105"},{"key":"6191_CR12","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"6191_CR13","unstructured":"Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. In: Advances in neural information processing systems, pp 2366\u20132374"},{"key":"6191_CR14","doi-asserted-by":"crossref","unstructured":"Yin W, Liu Y, Shen C, Yan Y (2019) Enforcing geometric constraints of virtual normal for depth prediction. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 5684\u20135693","DOI":"10.1109\/ICCV.2019.00578"},{"key":"6191_CR15","doi-asserted-by":"crossref","unstructured":"Fu H, Gong M, Wang C, Batmanghelich K, Tao D (2018) Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2002\u20132011","DOI":"10.1109\/CVPR.2018.00214"},{"key":"6191_CR16","unstructured":"Bhat SF, Alhashim I, Wonka P (2021) Adabins: Depth estimation using adaptive bins. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 4009\u20134018"},{"key":"6191_CR17","doi-asserted-by":"crossref","unstructured":"Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition","DOI":"10.1109\/CVPR.2017.700"},{"key":"6191_CR18","doi-asserted-by":"crossref","unstructured":"Godard C, Mac\u00a0Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3828\u20133838","DOI":"10.1109\/ICCV.2019.00393"},{"key":"6191_CR19","doi-asserted-by":"crossref","unstructured":"Zhao C, Zhang Y, Poggi M, Tosi F, Guo X, Zhu Z, Huang G, Tang Y, Mattoccia S (2022) Monovit: Self-supervised monocular depth estimation with a vision transformer. arXiv preprint arXiv:2208.03543","DOI":"10.1109\/3DV57658.2022.00077"},{"key":"6191_CR20","doi-asserted-by":"crossref","unstructured":"Watson J, Mac\u00a0Aodha O, Prisacariu V, Brostow G, Firman M (2021) The temporal opportunist: Self-supervised multi-frame monocular depth. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1164\u20131174","DOI":"10.1109\/CVPR46437.2021.00122"},{"key":"6191_CR21","doi-asserted-by":"crossref","unstructured":"Guizilini V, Ambru\u0219 R, Chen D, Zakharov S, Gaidon A (2022) Multi-frame self-supervised depth with transformers. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 160\u2013170","DOI":"10.1109\/CVPR52688.2022.00026"},{"key":"6191_CR22","unstructured":"Zhang S, Zhao C (2023) Dyna-depthformer: Multi-frame transformer for self-supervised depth estimation in dynamic scenes. arXiv preprint arXiv:2301.05871"},{"issue":"2","key":"6191_CR23","doi-asserted-by":"publisher","first-page":"5397","DOI":"10.1109\/LRA.2022.3150884","volume":"7","author":"V Guizilini","year":"2022","unstructured":"Guizilini V, Vasiljevic I, Ambrus R, Shakhnarovich G, Gaidon A (2022) Full surround monodepth from multiple cameras. IEEE Robot Autom Lett 7(2):5397\u20135404","journal-title":"IEEE Robot Autom Lett"},{"key":"6191_CR24","unstructured":"Wei Y, Zhao L, Zheng W, Zhu Z, Rao Y, Huang G, Lu J, Zhou J (2023) Surrounddepth: Entangling surrounding views for self-supervised multi-camera depth estimation. In: Conference on robot learning, PMLR, pp 539\u2013549"},{"key":"6191_CR25","doi-asserted-by":"crossref","unstructured":"Xu J, Liu X, Bai Y, Jiang J, Wang K, Chen X, Ji X (2022) Multi-camera collaborative depth prediction via consistent structure estimation. In: Proceedings of the 30th ACM international conference on multimedia, pp 2730\u20132738","DOI":"10.1145\/3503161.3548394"},{"key":"6191_CR26","doi-asserted-by":"crossref","unstructured":"Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 11621\u201311631","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"6191_CR27","doi-asserted-by":"crossref","unstructured":"Guizilini V, Ambrus R, Pillai S, Raventos A, Gaidon A (2020) 3d packing for self-supervised monocular depth estimation. In: IEEE Conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR42600.2020.00256"},{"key":"6191_CR28","doi-asserted-by":"crossref","unstructured":"Chen Z, Zhao H, Yuan B, Li X (2024) Stvit: Improving self-supervised multi-camera depth estimation with spatial-temporal context and adversarial geometry regularization (student abstract). In: Proceedings of the AAAI conference on artificial intelligence, vol 38, pp 23460\u201323461","DOI":"10.1609\/aaai.v38i21.30429"},{"key":"6191_CR29","doi-asserted-by":"crossref","unstructured":"Mahjourian R, Wicke M, Angelova A (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5667\u20135675","DOI":"10.1109\/CVPR.2018.00594"},{"key":"6191_CR30","doi-asserted-by":"crossref","unstructured":"Shu C, Yu K, Duan Z, Yang K (2020) Feature-metric loss for self-supervised learning of depth and egomotion. In: European conference on computer vision, Springer, pp 572\u2013588","DOI":"10.1007\/978-3-030-58529-7_34"},{"key":"6191_CR31","doi-asserted-by":"crossref","unstructured":"Poggi M, Aleotti F, Tosi F, Mattoccia S (2020) On the uncertainty of self-supervised monocular depth estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 3227\u20133237","DOI":"10.1109\/CVPR42600.2020.00329"},{"key":"6191_CR32","doi-asserted-by":"crossref","unstructured":"Yang N, Stumberg Lv, Wang R, Cremers D (2020) D3vo: Deep depth, deep pose and deep uncertainty for monocular visual odometry. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1281\u20131292","DOI":"10.1109\/CVPR42600.2020.00136"},{"key":"6191_CR33","doi-asserted-by":"crossref","unstructured":"Zhao H, Zhang J, Zhang S, Tao D (2022) Jperceiver: joint perception network for depth, pose and layout estimation in driving scenes. In: European conference on computer vision. Springer, pp 708\u2013726","DOI":"10.1007\/978-3-031-19839-7_41"},{"key":"6191_CR34","doi-asserted-by":"crossref","unstructured":"Zhao H, Bian W, Yuan B, Tao D (2020) Collaborative learning of depth estimation, visual odometry and camera relocalization from monocular videos. In: IJCAI, pp 488\u2013494","DOI":"10.24963\/ijcai.2020\/68"},{"key":"6191_CR35","doi-asserted-by":"crossref","unstructured":"Yin Z, Shi J (2018) Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1983\u20131992","DOI":"10.1109\/CVPR.2018.00212"},{"key":"6191_CR36","unstructured":"Zhou H, Greenwood D, Taylor S (2021) Self-supervised monocular depth estimation with internal feature fusion. In: British machine vision conference (BMVC)"},{"key":"6191_CR37","doi-asserted-by":"crossref","unstructured":"Zhao W, Liu S, Shu Y, Liu Y-J (2020) Towards better generalization: Joint depth-pose learning without posenet. In: Proceedings of IEEE conference on computer vision and pattern recognition","DOI":"10.1109\/CVPR42600.2020.00917"},{"key":"6191_CR38","doi-asserted-by":"crossref","unstructured":"Klingner M, Term\u00f6hlen J-A, Mikolajczyk J, Fingscheidt T (2020) Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In: European conference on computer vision, Springer, pp 582\u2013600","DOI":"10.1007\/978-3-030-58565-5_35"},{"key":"6191_CR39","doi-asserted-by":"crossref","unstructured":"Jung H, Park E, Yoo S (2021) Fine-grained semantics-aware representation enhancement for self-supervised monocular depth estimation. In: Proceedings of the IEEE international conference on computer vision, pp 12642\u201312652","DOI":"10.1109\/ICCV48922.2021.01241"},{"key":"6191_CR40","doi-asserted-by":"crossref","unstructured":"Bae J, Moon S, Im S (2022) Deep digging into the generalization of self-supervised monocular depth estimation. arXiv preprint arXiv:2205.11083","DOI":"10.1609\/aaai.v37i1.25090"},{"key":"6191_CR41","doi-asserted-by":"crossref","unstructured":"Liu Z, Li R, Shao S, Wu X, Chen W (2023) Self-supervised monocular depth estimation with self-reference distillation and disparity offset refinement. IEEE Trans Circ Syst Vid Technol","DOI":"10.1109\/TCSVT.2023.3275584"},{"key":"6191_CR42","doi-asserted-by":"crossref","unstructured":"Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5693\u20135703","DOI":"10.1109\/CVPR.2019.00584"},{"key":"6191_CR43","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. In: International conference on learning representations"},{"key":"6191_CR44","doi-asserted-by":"crossref","unstructured":"Vankadari M, Garg S, Majumder A, Kumar S, Behera A (2020) Unsupervised monocular depth estimation for night-time images using adversarial domain feature adaptation. In: Computer vision\u2013ECCV 2020: 16th European conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXVIII 16, Springer, pp 443\u2013459","DOI":"10.1007\/978-3-030-58604-1_27"},{"key":"6191_CR45","doi-asserted-by":"crossref","unstructured":"Wang W, Xu Z, Huang H, Liu J (2022) Self-aligned concave curve: Illumination enhancement for unsupervised adaptation. In: Proceedings of the 30th ACM international conference on multimedia, pp 2617\u20132626","DOI":"10.1145\/3503161.3547991"},{"key":"6191_CR46","doi-asserted-by":"crossref","unstructured":"Zheng Y, Zhong C, Li P, Gao H-a, Zheng Y, Jin B, Wang L, Zhao H, Zhou G, Zhang Q et al (2023) Steps: Joint self-supervised nighttime image enhancement and depth estimation. Proceedings of the IEEE international conference on robotics and automation","DOI":"10.1109\/ICRA48891.2023.10160708"},{"key":"6191_CR47","doi-asserted-by":"crossref","unstructured":"Ruhkamp P, Gao D, Chen H, Navab N, Busam B (2021) Attention meets geometry: Geometry guided spatial-temporal attention for consistent self-supervised monocular depth estimation. In: 2021 International conference on 3d vision (3DV), IEEE, pp 837\u2013847","DOI":"10.1109\/3DV53792.2021.00092"},{"key":"6191_CR48","doi-asserted-by":"crossref","unstructured":"Kendall A, Martirosyan H, Dasgupta S, Henry P, Kennedy R, Bachrach A, Bry A (2017) End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE international conference on computer vision, pp 66\u201375","DOI":"10.1109\/ICCV.2017.17"},{"key":"6191_CR49","doi-asserted-by":"crossref","unstructured":"Sun D, Yang X, Liu M-Y, Kautz J (2018) Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8934\u20138943","DOI":"10.1109\/CVPR.2018.00931"},{"key":"6191_CR50","doi-asserted-by":"crossref","unstructured":"Feng C, Chen Z, Zhang C, Hu W, Li B, Lu F (2023) Iterdepth: Iterative residual refinement for outdoor self-supervised multi-frame monocular depth estimation. IEEE Trans Circ Syst Vid Technol","DOI":"10.1109\/TCSVT.2023.3284479"},{"key":"6191_CR51","doi-asserted-by":"crossref","unstructured":"Miao X, Bai Y, Duan H, Huang Y, Wan F, Xu X, Long Y, Zheng Y (2023) Ds-depth: Dynamic and static depth estimation via a fusion cost volume. IEEE Trans Circ Syst Vid Technol","DOI":"10.1109\/TCSVT.2023.3305776"},{"key":"6191_CR52","doi-asserted-by":"crossref","unstructured":"Shi Y, Cai H, Ansari A, Porikli F (2023) Ega-depth: Efficient guided attention for self-supervised multi-camera depth estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 119\u2013129","DOI":"10.1109\/CVPRW59228.2023.00017"},{"key":"6191_CR53","unstructured":"Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville, A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 27"},{"issue":"11","key":"6191_CR54","doi-asserted-by":"publisher","first-page":"3365","DOI":"10.1109\/TVCG.2019.2921336","volume":"26","author":"Y Jing","year":"2019","unstructured":"Jing Y, Yang Y, Feng Z, Ye J, Yu Y, Song M (2019) Neural style transfer: A review. IEEE Trans Visual Comput Graphics 26(11):3365\u20133385","journal-title":"IEEE Trans Visual Comput Graphics"},{"key":"6191_CR55","doi-asserted-by":"crossref","unstructured":"Xu W, Long C, Wang R, Wang G (2021) Drb-gan: A dynamic resblock generative adversarial network for artistic style transfer. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 6383\u20136392","DOI":"10.1109\/ICCV48922.2021.00632"},{"key":"6191_CR56","doi-asserted-by":"crossref","unstructured":"Isola P, Zhu J-Y, Zhou T, Efros AA (2017) Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125\u20131134","DOI":"10.1109\/CVPR.2017.632"},{"key":"6191_CR57","doi-asserted-by":"crossref","unstructured":"Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223\u20132232","DOI":"10.1109\/ICCV.2017.244"},{"key":"6191_CR58","doi-asserted-by":"crossref","unstructured":"Zhu J-Y, Kr\u00e4henb\u00fchl P, Shechtman E, Efros AA (2016) Generative visual manipulation on the natural image manifold. In: Computer vision\u2013ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, Springer, pp 597\u2013613","DOI":"10.1007\/978-3-319-46454-1_36"},{"key":"6191_CR59","doi-asserted-by":"crossref","unstructured":"Chen Z, Wang C, Yuan B, Tao D (2020) Puppeteergan: Arbitrary portrait animation with semantic-aware appearance transformation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 13518\u201313527","DOI":"10.1109\/CVPR42600.2020.01353"},{"key":"6191_CR60","doi-asserted-by":"crossref","unstructured":"Chen Z, Wang C, Zhao H, Yuan B, Li X (2022) D2animator: Dual distillation of stylegan for high-resolution face animation. In: Proceedings of the 30th ACM international conference on multimedia, pp 1769\u20131778","DOI":"10.1145\/3503161.3548002"},{"key":"6191_CR61","doi-asserted-by":"crossref","unstructured":"Bousmalis K, Silberman N, Dohan D, Erhan D, Krishnan D (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3722\u20133731","DOI":"10.1109\/CVPR.2017.18"},{"key":"6191_CR62","doi-asserted-by":"crossref","unstructured":"Deng W, Zheng L, Ye Q, Kang G, Yang Y, Jiao J (2018) Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 994\u20131003","DOI":"10.1109\/CVPR.2018.00110"},{"key":"6191_CR63","doi-asserted-by":"crossref","unstructured":"CS\u00a0Kumar A, Bhandarkar SM, Prasad M (2018) Monocular depth prediction using generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 300\u2013308","DOI":"10.1109\/CVPRW.2018.00068"},{"key":"6191_CR64","doi-asserted-by":"crossref","unstructured":"Zhao C, Yen GG, Sun Q, Zhang C, Tang Y (2020) Masked gan for unsupervised depth and pose prediction with scale consistency. IEEE Trans Neural Netw Learn Syst","DOI":"10.1109\/TNNLS.2020.3044181"},{"issue":"10","key":"6191_CR65","doi-asserted-by":"publisher","first-page":"17039","DOI":"10.1109\/TITS.2021.3093592","volume":"23","author":"Y Xu","year":"2022","unstructured":"Xu Y, Wang Y, Huang R, Lei Z, Yang J, Li Z (2022) Unsupervised learning of depth estimation and camera pose with multi-scale gans. IEEE Trans Intell Transp Syst 23(10):17039\u201317047","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"6191_CR66","doi-asserted-by":"crossref","unstructured":"Zheng C, Cham T-J, Cai J (2018) T2net: Synthetic-to-realistic translation for solving single-image depth estimation tasks. In: Proceedings of the European conference on computer vision (ECCV), pp 767\u2013783","DOI":"10.1007\/978-3-030-01234-2_47"},{"key":"6191_CR67","doi-asserted-by":"crossref","unstructured":"Zhao S, Fu H, Gong M, Tao D (2019) Geometry-aware symmetric domain adaptation for monocular depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9788\u20139798","DOI":"10.1109\/CVPR.2019.01002"},{"key":"6191_CR68","doi-asserted-by":"crossref","unstructured":"Sun Q, Yen GG, Tang Y, Zhao C (2023) Learn to adapt for self-supervised monocular depth estimation. IEEE Trans Neural Netw Learn Syst","DOI":"10.1109\/TNNLS.2023.3289051"},{"key":"6191_CR69","doi-asserted-by":"crossref","unstructured":"Gaidon A, Wang Q, Cabon Y, Vig E (2016) Virtual worlds as proxy for multi-object tracking analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4340\u20134349","DOI":"10.1109\/CVPR.2016.470"},{"key":"6191_CR70","doi-asserted-by":"crossref","unstructured":"Wu Z, Wu X, Zhang X, Wang S, Ju L (2019) Spatial correspondence with generative adversarial network: Learning depth from monocular videos. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 7494\u20137504","DOI":"10.1109\/ICCV.2019.00759"},{"key":"6191_CR71","doi-asserted-by":"crossref","unstructured":"Wang K, Zhang Z, Yan Z, Li X, Xu B, Li J, Yang J (2021) Regularizing nighttime weirdness: Efficient self-supervised monocular depth estimation in the dark. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 16055\u201316064","DOI":"10.1109\/ICCV48922.2021.01575"},{"key":"6191_CR72","doi-asserted-by":"crossref","unstructured":"Lee Y, Kim J, Willette J, Hwang SJ (2022) Mpvit: Multi-path vision transformer for dense prediction. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 7287\u20137296","DOI":"10.1109\/CVPR52688.2022.00714"},{"issue":"4","key":"6191_CR73","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1109\/TIP.2003.819861","volume":"13","author":"Z Wang","year":"2004","unstructured":"Wang Z, Bovik AC, Sheikh HR, Simoncelli EP et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600\u2013612","journal-title":"IEEE Trans Image Process"},{"key":"6191_CR74","doi-asserted-by":"crossref","unstructured":"Godard C, Mac\u00a0Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition","DOI":"10.1109\/CVPR.2017.699"},{"issue":"3","key":"6191_CR75","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1016\/S0734-189X(87)80186-X","volume":"39","author":"SM Pizer","year":"1987","unstructured":"Pizer SM, Amburn EP, Austin JD, Cromartie R, Geselowitz A, Greer T, Haar Romeny B, Zimmerman JB, Zuiderveld K (1987) Adaptive histogram equalization and its variations. Comput Vision, graph Image Process 39(3):355\u2013368","journal-title":"Comput Vision, graph Image Process"},{"key":"6191_CR76","doi-asserted-by":"crossref","unstructured":"Dijk Tv, Croon Gd (2019) How do neural networks see depth in single images? In: Proceedings of the IEEE\/CVF international conference on computer vision, pp. 2183\u20132191","DOI":"10.1109\/ICCV.2019.00227"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-024-06191-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-024-06191-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-024-06191-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T17:21:22Z","timestamp":1740244882000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-024-06191-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,16]]},"references-count":76,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,4]]}},"alternative-id":["6191"],"URL":"https:\/\/doi.org\/10.1007\/s10489-024-06191-6","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"value":"0924-669X","type":"print"},{"value":"1573-7497","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,16]]},"assertion":[{"value":"11 December 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 January 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"There is NO Competing Interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests:"}},{"value":"This article does not contain any studies with human participants performed by any of the authors. The data used in this study were obtained from publicly available autonomous driving datasets (e.g., Nuscenes, DDAD). All data were anonymized to protect the privacy of individuals. The use of these datasets complies with the terms and conditions set by the data providers and adheres to ethical guidelines for data usage in research.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical and informed consent for data used:"}}],"article-number":"328"}}