{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T11:37:54Z","timestamp":1763811474715,"version":"3.41.0"},"reference-count":83,"publisher":"Association for Computing Machinery (ACM)","issue":"10","license":[{"start":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T00:00:00Z","timestamp":1730246400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U2033218"],"award-info":[{"award-number":["U2033218"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Shanghai Local Capacity Enhancement project","award":["21010501500"],"award-info":[{"award-number":["21010501500"]}]},{"name":"\u201cScience and Technology Innovation Action Plan\u201d of Shanghai Science and Technology Commission for social development project","award":["21DZ1204900"],"award-info":[{"award-number":["21DZ1204900"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,10,31]]},"abstract":"<jats:p>Three-dimensional perception in intelligent virtual and augmented reality (VR\/AR) and autonomous vehicles (AV) applications is critical and attracting significant attention. The self-supervised monocular depth and ego-motion estimation serves as a more intelligent learning approach that provides the required scene depth and location for 3D perception. However, the existing self-supervised learning methods suffer from scale ambiguity, boundary blur, and imbalanced depth distribution, limiting the practical applications of VR\/AR and AV. In this article, we propose a new self-supervised learning framework based on superpixel and normal constraints to address these problems. Specifically, we formulate a novel 3D edge structure consistency loss to alleviate the boundary blur of depth estimation. To address the scale ambiguity of estimated depth and ego-motion, we propose a novel surface normal network for efficient camera height estimation. The surface normal network is composed of a deep fusion module and a full-scale hierarchical feature aggregation module. Meanwhile, to realize the global smoothing and boundary discriminability of the predicted normal map, we introduce a novel fusion loss which is based on the consistency constraints of the normal in edge domains and superpixel regions. Experiments are conducted on several benchmarks, and the results illustrate that the proposed approach outperforms the state-of-the-art methods in depth, ego-motion, and surface normal estimation.<\/jats:p>","DOI":"10.1145\/3674977","type":"journal-article","created":{"date-parts":[[2024,7,1]],"date-time":"2024-07-01T13:51:58Z","timestamp":1719841918000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Monocular Depth and Ego-motion Estimation with Scale Based on Superpixel and Normal Constraints"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3653-0858","authenticated-orcid":false,"given":"Junxin","family":"Lu","sequence":"first","affiliation":[{"name":"East China Normal University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9930-0502","authenticated-orcid":false,"given":"Yongbin","family":"Gao","sequence":"additional","affiliation":[{"name":"Shanghai University of Engineering Science, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7224-8756","authenticated-orcid":false,"given":"Jieyu","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8877-2421","authenticated-orcid":false,"given":"Jeng-Neng","family":"Hwang","sequence":"additional","affiliation":[{"name":"University of Washington, Seattle, WA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5256-210X","authenticated-orcid":false,"given":"Hamido","family":"Fujita","sequence":"additional","affiliation":[{"name":"Iwate Prefectural University, Iwate, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8563-5678","authenticated-orcid":false,"given":"Zhijun","family":"Fang","sequence":"additional","affiliation":[{"name":"Shanghai University of Engineering Science, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2024,10,30]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.120"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i1.25090"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3209661"},{"key":"e_1_3_2_5_2","first-page":"726","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Bangunharcana Antyanta","year":"2023","unstructured":"Antyanta Bangunharcana, Ahmed Magd, and Kyung-Soo Kim. 2023. DualRefine: Self-supervised depth and pose estimation through iterative epipolar sampling and refinement toward equilibrium. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 726\u2013738."},{"key":"e_1_3_2_6_2","first-page":"35","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"32","author":"Bian Jiawang","year":"2019","unstructured":"Jiawang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, and Ian Reid. 2019. Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 32. 35\u201345."},{"issue":"8","key":"e_1_3_2_7_2","first-page":"2674","article-title":"Monocular depth estimation with augmented ordinal depth relationships","volume":"30","author":"Cao Yuanzhouhan","year":"2018","unstructured":"Yuanzhouhan Cao, Tianqi Zhao, Ke Xian, Chunhua Shen, Zhiguo Cao, and Shugong Xu. 2018. Monocular depth estimation with augmented ordinal depth relationships. IEEE Transactions on Image Processing 30, 8 (2018), 2674\u20132682.","journal-title":"IEEE Transactions on Image Processing"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018001"},{"key":"e_1_3_2_9_2","first-page":"381","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer -Vision and Pattern Recognition Workshops","author":"Casser Vincent","year":"2019","unstructured":"Vincent Casser, Soeren Pirk, Reza Mahjourian, and Anelia Angelova. 2019b. Unsupervised monocular depth and ego-motion learning with structure and semantics. In Proceedings of the IEEE\/CVF Conference on Computer -Vision and Pattern Recognition Workshops. 381\u2013388."},{"key":"e_1_3_2_10_2","first-page":"12808","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Choi Hyesong","year":"2021","unstructured":"Hyesong Choi, Hunsang Lee, Sunkyung Kim, Sunok Kim, Seungryong Kim, Kwanghoon Sohn, and Dongbo Min. 2021. Adaptive confidence thresholding for monocular depth estimation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 12808\u201312818."},{"key":"e_1_3_2_11_2","unstructured":"Jaehoon Choi Dongki Jung Donghwan Lee and Changick Kim. 2020. SAFENet: Self-supervised monocular depth estimation with semantic-aware feature extraction. arXiv:2010.02893. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:222142362"},{"issue":"7","key":"e_1_3_2_12_2","doi-asserted-by":"crossref","first-page":"6654","DOI":"10.1109\/TITS.2021.3060001","article-title":"Deep learning-based incorporation of planar constraints for robust stereo depth estimation in autonomous vehicle applications","volume":"23","author":"Chuah Weiqin","year":"2021","unstructured":"Weiqin Chuah, Ruwan Tennakoon, Reza Hoseinnezhad, and Alireza Bab-Hadiashar. 2021. Deep learning-based incorporation of planar constraints for robust stereo depth estimation in autonomous vehicle applications. IEEE Transactions on Intelligent Transportation Systems 23, 7 (2021), 6654\u20136665.","journal-title":"IEEE Transactions on Intelligent Transportation Systems"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.350"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.304"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","unstructured":"David Eigen Christian Puhrsch and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. arXiv:1406.2283. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.1406.2283","DOI":"10.48550\/arXiv.1406.2283"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00214"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913491297"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354978"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.699"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00393"},{"key":"e_1_3_2_21_2","first-page":"2886","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops","author":"Goldman Matan","year":"2019","unstructured":"Matan Goldman, Tal Hassner, and Shai Avidan. 2019. Learn stereo, infer mono: Siamese networks for self-supervised, monocular, depth estimation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2886\u20132895."},{"key":"e_1_3_2_22_2","first-page":"254","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Gonzalez Juan Luis","year":"2023","unstructured":"Juan Luis Gonzalez, Jaeho Moon, and Munchurl Kim. 2023. Detail-preserving self-supervised monocular depth with self-supervised structural sharpening. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 254\u2013264."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00907"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00256"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_26_2","first-page":"1055","volume-title":"Proceedings of the ICASSP 2020\u20132020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Huang Huimin","year":"2020","unstructured":"Huimin Huang, Lanfen Lin, Ruofeng Tong, Hongjie Hu, Qiaowei Zhang, Yutaro Iwamoto, Xianhua Han, Yen-Wei Chen, and Jian Wu. 2020. Unet 3+: A full-scale connected unet for medical image segmentation. In Proceedings of the ICASSP 2020\u20132020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1055\u20131059."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00172"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2936024"},{"key":"e_1_3_2_29_2","first-page":"53","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV)","author":"Jiao Jianbo","year":"2018","unstructured":"Jianbo Jiao, Ying Cao, Yibing Song, and Rynson Lau. 2018. Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In Proceedings of the European Conference on Computer Vision (ECCV). 53\u201369."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00481"},{"key":"e_1_3_2_31_2","first-page":"4220","volume-title":"Proceedings of the 2014 IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Jordan Krzysztof","year":"2014","unstructured":"Krzysztof Jordan and Philippos Mordohai. 2014. A quantitative evaluation of surface normal estimation in point clouds. In Proceedings of the 2014 IEEE\/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4220\u20134226."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01241"},{"key":"e_1_3_2_33_2","first-page":"582","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Klingner Marvin","year":"2020","unstructured":"Marvin Klingner, Jan-Aike Term\u00f6hlen, Jonas Mikolajczyk, and Tim Fingscheidt. 2020. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In Proceedings of the European Conference on Computer Vision. Springer, 582\u2013600."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/3DV.2016.32"},{"key":"e_1_3_2_35_2","first-page":"1908","volume-title":"Proceedings of the 4th Conference on Robot Learning (CoRL \u201920)","volume":"155","author":"Li Hanhan","year":"2020","unstructured":"Hanhan Li, Ariel Gordon, Hang Zhao, Vincent Casser, and Anelia Angelova. 2020. Unsupervised monocular depth learning in dynamic scenes. In Proceedings of the 4th Conference on Robot Learning (CoRL \u201920), Vol. 155. 1908\u20131917."},{"key":"e_1_3_2_36_2","first-page":"361","volume-title":"Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO \u201917)","author":"Li Tingguang","year":"2017","unstructured":"Tingguang Li, Delong Zhu, and Max Q.-H. Meng. 2017. A hybrid 3dof pose estimation method based on camera and lidar data. In Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO \u201917). IEEE, 361\u2013366."},{"key":"e_1_3_2_37_2","first-page":"2294","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"35","author":"Lyu Xiaoyang","year":"2021","unstructured":"Xiaoyang Lyu, Liang Liu, Mengmeng Wang, Xin Kong, Lina Liu, Yong Liu, Xinxin Chen, and Yi Yuan. 2021. Hr-depth: High resolution self-supervised monocular depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2294\u20132301."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00594"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2015.2463671"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","unstructured":"Mertalp Ocal and Armin Mustafa. 2020. RealMonoDepth: Self-supervised monocular depth estimation for general scenes. arXiv:2004.06267. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2004.06267","DOI":"10.48550\/arXiv.2004.06267"},{"issue":"2","key":"e_1_3_2_41_2","first-page":"1","article-title":"Monocular vision aided depth measurement from RGB images for autonomous UAV navigation","volume":"20","author":"Padhy Ram Prasad","year":"2022","unstructured":"Ram Prasad Padhy, Pankaj Kumar Sa, Fabio Narducci, Carmen Bisogni, and Sambit Bakshi. 2022. Monocular vision aided depth measurement from RGB images for autonomous UAV navigation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 20, 2 (2022), 1\u201322.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)"},{"key":"e_1_3_2_42_2","first-page":"15560","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Peng Rui","year":"2021","unstructured":"Rui Peng, Ronggang Wang, Yawen Lai, Luyang Tang, and Yangang Cai. 2021. Excavating the potential capacity of self-supervised monocular depth estimation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 15560\u201315569."},{"key":"e_1_3_2_43_2","first-page":"1578","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Petrovai Andra","year":"2022","unstructured":"Andra Petrovai and Sergiu Nedevschi. 2022. Exploiting pseudo labels in a self-supervised learning framework for improved monocular depth estimation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1578\u20131588."},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1109\/3DV.2018.00073","volume-title":"Proceedings of the International Conference on 3D Vision (3DV \u201918)","author":"Pilzer Andrea","year":"2018","unstructured":"Andrea Pilzer, Dan Xu, Mihai Puscas, Elisa Ricci, and Nicu Sebe. 2018. Unsupervised adversarial depth estimation using cycled generative networks. In Proceedings of the International Conference on 3D Vision (3DV \u201918). IEEE, 587\u2013595."},{"key":"e_1_3_2_45_2","first-page":"363","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV) Workshops","author":"Pinard Cl\u00e9ment","year":"2018","unstructured":"Cl\u00e9ment Pinard, Laure Chevalley, Antoine Manzanera, and David Filliat. 2018. Learning structure-from-motion from motion. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 363\u2013376."},{"key":"e_1_3_2_46_2","first-page":"283","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Qi Xiaojuan","year":"2018","unstructured":"Xiaojuan Qi, Renjie Liao, Zhengzhe Liu, Raquel Urtasun, and Jiaya Jia. 2018. Geonet: Geometric neural network for joint depth and surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 283\u2013291."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3588571"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.440"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/38.814551"},{"key":"e_1_3_2_51_2","first-page":"1735","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS \u201919)","author":"Roussel Tom","year":"2019","unstructured":"Tom Roussel, Luc Van Eycken, and Tinne Tuytelaars. 2019. Monocular depth estimation in new environments with absolute scale. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS \u201919). IEEE, 1735\u20131741."},{"key":"e_1_3_2_52_2","first-page":"5506","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Roy Anirban","year":"2016","unstructured":"Anirban Roy and Sinisa Todorovic. 2016. Monocular depth estimation using neural regression forest. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5506\u20135514."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.132"},{"key":"e_1_3_2_54_2","first-page":"6359","volume-title":"Proceedings of the International Conference on Robotics and Automation","author":"Shen Tianwei","year":"2019","unstructured":"Tianwei Shen, Zixin Luo, Lei Zhou, Hanyu Deng, Runze Zhang, Tian Fang, and Long Quan. 2019. Beyond photometric loss for self-supervised ego-motion estimation. In Proceedings of the International Conference on Robotics and Automation. IEEE, 6359\u20136365."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-019-09881-0"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3074306"},{"key":"e_1_3_2_58_2","first-page":"988","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Swami Kunal","year":"2022","unstructured":"Kunal Swami, Amrit Muduli, Uttam Gurram, and Pankaj Bajpai. 2022. Do what you can, with what you have: Scale-aware and high quality monocular depth estimation without real world labels. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 988\u2013997."},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS51168.2021.9635938"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2968751"},{"key":"e_1_3_2_61_2","first-page":"541","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Wang Lijun","year":"2020","unstructured":"Lijun Wang, Jianming Zhang, Oliver Wang, Zhe Lin, and Huchuan Lu. 2020b. Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 541\u2013550."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00570"},{"key":"e_1_3_2_63_2","first-page":"539","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Wang Xiaolong","year":"2015","unstructured":"Xiaolong Wang, David Fouhey, and Abhinav Gupta. 2015. Designing deep networks for surface normal estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 539\u2013547."},{"key":"e_1_3_2_64_2","first-page":"988","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation (ICRA \u201918)","author":"Wang Xiangwei","year":"2018","unstructured":"Xiangwei Wang, Hui Zhang, Xiaochuan Yin, Mingxiao Du, and Qijun Chen. 2018. Monocular visual odometry scale recovery using geometrical constraint. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA \u201918). IEEE, 988\u2013995."},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_2_66_2","first-page":"2162","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Watson Jamie","year":"2019","unstructured":"Jamie Watson, Michael Firman, Gabriel J Brostow, and Daniyar Turmukhambetov. 2019. Self-supervised monocular depth hints. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 2162\u20132171."},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00122"},{"key":"e_1_3_2_68_2","first-page":"1204","article-title":"Fast monocular depth estimation via side prediction aggregation with continuous spatial refinement","volume":"25","author":"Wu Jipeng","year":"2022","unstructured":"Jipeng Wu, Rongrong Ji, Qiang Wang, Shengchuan Zhang, Xiaoshuai Sun, Yan Wang, Mingliang Xu, and Feiyue Huang. 2022. Fast monocular depth estimation via side prediction aggregation with continuous spatial refinement. IEEE Transactions on Multimedia 25 (2022), 1204\u20131216.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_2_69_2","first-page":"963","volume-title":"Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence (IJCAI)","author":"Xiong Mingkang","year":"2020","unstructured":"Mingkang Xiong, Zhenghong Zhang, Weilin Zhong, Jinsheng Ji, Jiyuan Liu, and Huilin Xiong. 2020. Self-supervised monocular depth and visual odometry learning with scale-consistent geometric constraints. In Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence (IJCAI). 963\u2013969."},{"key":"e_1_3_2_70_2","first-page":"2330","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS \u201920)","author":"Xue Feng","year":"2020","unstructured":"Feng Xue, Guirong Zhuo, Ziyuan Huang, Wufei Fu, Zhuoyue Wu, and Marcelo H Ang. 2020. Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS \u201920). IEEE, 2330\u20132337."},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00031"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","unstructured":"Zhenheng Yang Peng Wang Wei Xu Liang Zhao and Ramakant Nevatia. 2017. Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv:1711.03665. Retrieved from 10.48550\/arXiv.1711.03665","DOI":"10.48550\/arXiv.1711.03665"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00212"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00631"},{"key":"e_1_3_2_75_2","doi-asserted-by":"crossref","first-page":"4811","DOI":"10.1109\/ICRA.2019.8793984","volume-title":"Proceedings of the International Conference on Robotics and Automation (ICRA \u201919)","author":"Zhan Huangying","year":"2019","unstructured":"Huangying Zhan, Chamara Saroj Weerasekera, Ravi Garg, and Ian Reid. 2019. Self-supervised learning for single view depth and surface normal estimation. In Proceedings of the International Conference on Robotics and Automation (ICRA \u201919). IEEE, 4811\u20134817."},{"key":"e_1_3_2_76_2","doi-asserted-by":"crossref","unstructured":"Yourun Zhang Maoguo Gong Jianzhao Li Mingyang Zhang Fenlong Jiang and Hongyu Zhao. 2022. self-supervised monocular depth estimation with multiscale perception. IEEE Transactions on Image Processing 31 (2022) 3251\u20133266.","DOI":"10.1109\/TIP.2022.3167307"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00423"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.3044181"},{"issue":"9","key":"e_1_3_2_79_2","doi-asserted-by":"crossref","first-page":"5580","DOI":"10.1109\/TNNLS.2021.3129801","article-title":"Distance transform pooling neural network for LiDAR depth completion","volume":"34","author":"Zhao Yiming","year":"2021","unstructured":"Yiming Zhao, Mahdi Elhousni, Ziming Zhang, and Xinming Huang. 2021. Distance transform pooling neural network for LiDAR depth completion. IEEE Transactions on Neural Networks and Learning Systems 34, 9 (2021), 5580\u20135589.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"2","key":"e_1_3_2_80_2","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1109\/TITS.2019.2900330","article-title":"Ground-plane-based absolute scale estimation for monocular visual odometry","volume":"21","author":"Zhou Dingfu","year":"2019","unstructured":"Dingfu Zhou, Yuchao Dai, and Hongdong Li. 2019. Ground-plane-based absolute scale estimation for monocular visual odometry. IEEE Transactions on Intelligent Transportation Systems 21, 2 (2019), 791\u2013802.","journal-title":"IEEE Transactions on Intelligent Transportation Systems"},{"key":"e_1_3_2_81_2","first-page":"378","volume-title":"Self-Supervised Monocular Depth Estimation with Internal Feature Fusion","author":"Zhou Hang","year":"2021","unstructured":"Hang Zhou, David Greenwood, and Sarah Taylor. 2021. Self-Supervised Monocular Depth Estimation with Internal Feature Fusion. BMVA Press, 378."},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.700"},{"key":"e_1_3_2_83_2","first-page":"13116","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhu Shengjie","year":"2020","unstructured":"Shengjie Zhu, Garrick Brazil, and Xiaoming Liu. 2020. The edge of depth: Explicit constraints between segmentation and depth. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13116\u201313125."},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01228-1_3"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674977","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3674977","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:05:56Z","timestamp":1750291556000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674977"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,30]]},"references-count":83,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10,31]]}},"alternative-id":["10.1145\/3674977"],"URL":"https:\/\/doi.org\/10.1145\/3674977","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2024,10,30]]},"assertion":[{"value":"2023-07-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-22","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}