{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T15:10:05Z","timestamp":1751382605437,"version":"3.41.0"},"reference-count":64,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["2022JBQY009"],"award-info":[{"award-number":["2022JBQY009"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key R&D Program","award":["2022YFB2603302"],"award-info":[{"award-number":["2022YFB2603302"]}]},{"DOI":"10.13039\/501100001809","name":"National Nature Science Foundation of China","doi-asserted-by":"crossref","award":["51827813"],"award-info":[{"award-number":["51827813"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>Iterative inference approaches have shown promising success in the task of multi-view depth estimation. However, these methods put excessive emphasis on the universal inter-view correspondences while neglecting the correspondence ambiguity in regions of low texture and depth discontinuous areas. Thus, they are prone to produce inaccurate or even erroneous depth estimations, which is further exacerbated due to cumulative errors especially in the iterative pipeline, providing unreliable information in many real-world scenarios. In this article, we revisit this issue from the intra-view contextual hints and introduce a novel enhancing iterative approach, named EnIter. Concretely, at the beginning of each iteration, we present a Depth Intercept (DI) modulator to provide more accurate depth by aggregating neighbor uncertainty, correlation volume of reference and normal. This plug and play modulator is effective at intercepting the erroneous depth estimations with implicit guidance from the universal correlation contextual hints, especially for the challenging regions. Furthermore, at the end of each iteration, we refine the depth map with another plug and play modulator termed as Depth Refine (DR). It mines the latent structure knowledge of reference contextual hints and establishes one-way dependency using local attention from reference features to depth, yielding delicate depth in detail. Extensive experiment demonstrates that our method not only achieves state-of-the-art performance over existing models but also exhibits remarkable universality in popular iterative pipelines, e.g., CasMVS, UCSNet, TransMVS, and UniMVS.<\/jats:p>","DOI":"10.1145\/3731760","type":"journal-article","created":{"date-parts":[[2025,4,24]],"date-time":"2025-04-24T12:36:55Z","timestamp":1745498215000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["EnIter: Enhancing Iterative Multi-View Depth Estimation with Universal Contextual Hints"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6690-1819","authenticated-orcid":false,"given":"Qianqian","family":"Du","sequence":"first","affiliation":[{"name":"State Key Laboratory of Advanced Rail Autonomous Operation, Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing Jiaotong University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4226-4368","authenticated-orcid":false,"given":"Hui","family":"Yin","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Advanced Rail Autonomous Operation, Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Frontiers Science Center for Smart High-Speed Railway System, Beijing Jiaotong University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7776-889X","authenticated-orcid":false,"given":"Lang","family":"Nie","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0828-4023","authenticated-orcid":false,"given":"Yanting","family":"Liu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Beijing for Railway Engineering, School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9245-0110","authenticated-orcid":false,"given":"Jin","family":"Wan","sequence":"additional","affiliation":[{"name":"Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,7]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0902-9"},{"key":"e_1_3_1_3_2","first-page":"5861","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition Workshops","author":"Agarwal Ashutosh","year":"2023","unstructured":"Ashutosh Agarwal and Chetan Arora. 2023. Attention attention everywhere: Monocular depth prediction with skip attention. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 5861\u20135870."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2001269.2001293"},{"key":"e_1_3_1_5_2","first-page":"2842","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Bae Gwangbin","year":"2022","unstructured":"Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla. 2022. Multi-view depth estimation by fusing single-view depth probability with multi-view geometry. In IEEE Conference on Computer Vision and Pattern Recognition, 2842\u20132851."},{"key":"e_1_3_1_6_2","first-page":"4009","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Bhat Shariq Farooq","year":"2021","unstructured":"Shariq Farooq Bhat, Ibraheem Alhashim, and Peter Wonka. 2021. Adabins: Depth estimation using adaptive bins. In IEEE Conference on Computer Vision and Pattern Recognition, 4009\u20134018."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAI.2024.3427068"},{"key":"e_1_3_1_8_2","first-page":"10138","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Cheng JunDa","year":"2024","unstructured":"JunDa Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, and Xin Yang. 2024. Adaptive fusion of single-view and multi-view depth for autonomous driving. In IEEE Conference on Computer Vision and Pattern Recognition, 10138\u201310147."},{"key":"e_1_3_1_9_2","first-page":"2524","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Cheng Shuo","year":"2020","unstructured":"Shuo Cheng, Zexiang Xu, Shilin Zhu, Zhuwen Li, Li Erran Li, Ravi Ramamoorthi, and Hao Su. 2020. Deep stereo using adaptive thin volume representation with uncertainty awareness. In IEEE Conference on Computer Vision and Pattern Recognition, 2524\u20132534."},{"key":"e_1_3_1_10_2","first-page":"103","volume-title":"European Conference on Computer Vision","author":"Cheng Xinjing","year":"2018","unstructured":"Xinjing Cheng, Peng Wang, and Ruigang Yang. 2018. Depth estimation via affinity learned with convolutional spatial propagation network. In European Conference on Computer Vision, 103\u2013119."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.261"},{"key":"e_1_3_1_12_2","first-page":"8585","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Ding Yikang","year":"2022","unstructured":"Yikang Ding, Wentao Yuan, Qingtian Zhu, Haotian Zhang, Xiangyue Liu, Yuanjiang Wang, and Xiao Liu. 2022. Transmvsnet: Global context-aware multi-view stereo network with transformers. In IEEE Conference on Computer Vision and Pattern Recognition, 8585\u20138594."},{"key":"e_1_3_1_13_2","first-page":"2366","article-title":"Depth map prediction from a single image using a multi-scale deep network","author":"Eigen David","year":"2014","unstructured":"David Eigen, Christian Puhrsch, and Fergus Rob. 2014. Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems, 2366\u20132374.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3284479"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00214"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913491297"},{"issue":"5","key":"e_1_3_1_17_2","doi-asserted-by":"crossref","first-page":"2844","DOI":"10.1109\/LRA.2023.3260724","article-title":"Dro: Deep recurrent optimizer for video to depth","volume":"8","author":"Gu Xiaodong","year":"2023","unstructured":"Xiaodong Gu, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Chengzhou Tang, Zilong Dong, and Ping Tan. 2023. Dro: Deep recurrent optimizer for video to depth. IEEE Robotics and Automation Letters 8, 5 (2023), 2844\u20132851.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"e_1_3_1_18_2","first-page":"2495","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Gu Xiaodong","year":"2020","unstructured":"Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, and Ping Tan. 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In IEEE Conference on Computer Vision and Pattern Recognition, 2495\u20132504."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00256"},{"key":"e_1_3_1_20_2","first-page":"1618","volume-title":"IEEE International Conference on Intelligent Robots and Systems","author":"H\u00e4ne Christian","year":"2011","unstructured":"Christian H\u00e4ne, Christopher Zach, Jongwoo Lim, Ananth Ranganathan, and Marc Pollefeys. 2011. Stereo depth map fusion for robot navigation. In IEEE International Conference on Intelligent Robots and Systems, 1618\u20131625."},{"key":"e_1_3_1_21_2","volume-title":"International Conference on Learning Representations","author":"Im Sunghoon","year":"2019","unstructured":"Sunghoon Im, Hae-Gon Jeon, Stephen Lin, and In So Kweon. 2019. DPSNet: End-to-end deep plane sweep stereo. In International Conference on Learning Representations. arXiv:1905.00538. Retrieved from https:\/\/arxiv.org\/abs\/1905.00538"},{"issue":"4","key":"e_1_3_1_22_2","doi-asserted-by":"crossref","first-page":"7791","DOI":"10.1109\/LRA.2021.3101049","article-title":"ADAADepth: Adapting data augmentation and attention for self-supervised monocular depth estimation","volume":"6","author":"Kaushik Vinay","year":"2021","unstructured":"Vinay Kaushik, Kartik Jindgar, and Brejesh Lall. 2021. ADAADepth: Adapting data augmentation and attention for self-supervised monocular depth estimation. IEEE Robotics and Automation Letters 6, 4 (2021), 7791\u20137798.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2836318"},{"key":"e_1_3_1_24_2","first-page":"2189","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Kusupati Uday","year":"2020","unstructured":"Uday Kusupati, Shuo Cheng, Rui Chen, and Hao Su. 2020. Normal assisted stereo depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition, 2189\u20132199."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/3DV.2016.32"},{"key":"e_1_3_1_26_2","unstructured":"Jin Han Lee Myung-Kyu Han Dong Wook Ko and Il Hong Suh. 2019. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326. Retrieved from https:\/\/arxiv.org\/abs\/1907.10326"},{"key":"e_1_3_1_27_2","first-page":"1873","volume-title":"AAAI Conference on Artificial Intelligence","author":"Lee Sihaeng","year":"2021","unstructured":"Sihaeng Lee, Janghyeon Lee, Byungju Kim, Eojindl Yi, and Junmo Kim. 2021. Patch-wise attention network for monocular depth estimation. In AAAI Conference on Artificial Intelligence, 1873\u20131881."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2023.3272170"},{"key":"e_1_3_1_29_2","first-page":"21539","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Li Rui","year":"2023","unstructured":"Rui Li, Dong Gong, Wei Yin, Hao Chen, Yu Zhu, Kaixuan Wang, Xiaozhi Chen, Jinqiu Sun, and Yanning Zhang. 2023. Learning to fuse monocular and multi-view cues for multi-frame depth estimation in dynamic scenes. In IEEE Conference on Computer Vision and Pattern Recognition, 21539\u201321548."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11633-023-1458-0"},{"key":"e_1_3_1_31_2","first-page":"10986","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Liu Chao","year":"2019","unstructured":"Chao Liu, Jinwei Gu, Kihwan Kim, Srinivasa G. Narasimhan, and Jan Kautz. 2019. Neural rgb (r) d sensing: Depth and uncertainty from a video camera. In IEEE Conference on Computer Vision and Pattern Recognition, 10986\u201310995."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3638559"},{"key":"e_1_3_1_33_2","first-page":"1520","article-title":"Learning affinity via spatial propagation networks","author":"Liu Sifei","year":"2017","unstructured":"Sifei Liu, Shalini De Mello, Jinwei Gu, Guangyu Zhong, Ming-Hsuan Yang, and Jan Kautz. 2017. Learning affinity via spatial propagation networks. In Advances in Neural Information Processing Systems, 1520\u20131530.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_34_2","first-page":"10012","volume-title":"IEEE International Conference on Computer Vision","author":"Liu Ze","year":"2021","unstructured":"Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE International Conference on Computer Vision, 10012\u201310022."},{"key":"e_1_3_1_35_2","first-page":"8258","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Long Xiaoxiao","year":"2021","unstructured":"Xiaoxiao Long, Lingjie Liu, Wei Li, Christian Theobalt, and Wenping Wang. 2021. Multi-view depth estimation using epipolar spatio-temporal networks. In IEEE Conference on Computer Vision and Pattern Recognition, 8258\u20138267."},{"key":"e_1_3_1_36_2","first-page":"640","volume-title":"European Conference on Computer Vision","author":"Long Xiaoxiao","year":"2020","unstructured":"Xiaoxiao Long, Lingjie Liu, Christian Theobalt, and Wenping Wang. 2020. Occlusion-aware depth estimation with adaptive normal constraints. In European Conference on Computer Vision, 640\u2013657."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3674977"},{"key":"e_1_3_1_38_2","first-page":"10452","volume-title":"IEEE International Conference on Computer Vision","author":"Luo Keyang","year":"2019","unstructured":"Keyang Luo, Tao Guan, Lili Ju, Haipeng Huang, and Yawei Luo. 2019. P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In IEEE International Conference on Computer Vision, 10452\u201310461."},{"key":"e_1_3_1_39_2","first-page":"5732","volume-title":"IEEE International Conference on Computer Vision","author":"Ma Xinjun","year":"2021","unstructured":"Xinjun Ma, Yue Gong, Qirui Wang, Jingwei Huang, Lei Chen, and Fan Yu. 2021. Epp-mvsnet: Epipolar-assembling based depth prediction for multi-view stereo. In IEEE International Conference on Computer Vision, 5732\u20135740."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19821-2_42"},{"key":"e_1_3_1_41_2","first-page":"1610","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Patil Vaishakh","year":"2022","unstructured":"Vaishakh Patil, Christos Sakaridis, Alexander Liniger, and Luc Van Gool. 2022. P3depth: Monocular depth estimation with a piecewise planarity prior. In IEEE Conference on Computer Vision and Pattern Recognition, 1610\u20131621."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3017478"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3663570"},{"key":"e_1_3_1_44_2","first-page":"8645","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Peng Rui","year":"2022","unstructured":"Rui Peng, Rongjie Wang, Zhenyu Wang, Yawen Lai, and Ronggang Wang. 2022. Rethinking depth estimation for multi-view stereo: A unified representation. In IEEE Conference on Computer Vision and Pattern Recognition, 8645\u20138654."},{"key":"e_1_3_1_45_2","first-page":"21477","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Piccinelli Luigi","year":"2023","unstructured":"Luigi Piccinelli, Christos Sakaridis, and Fisher Yu. 2023. iDisc: Internal discretization for monocular depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition, 21477\u201321487."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3224810"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3411571"},{"key":"e_1_3_1_48_2","first-page":"1","article-title":"Iebins: Iterative elastic bins for monocular depth estimation","author":"Shao Shuwei","year":"2024","unstructured":"Shuwei Shao, Zhongcai Pei, Xingming Wu, Zhong Liu, Weihai Chen, and Zhengguo Li. 2024. Iebins: Iterative elastic bins for monocular depth estimation. In Advances in Neural Information Processing Systems, 1\u20138.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_49_2","first-page":"2930","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Shotton Jamie","year":"2013","unstructured":"Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon. 2013. Scene coordinate regression forests for camera relocalization in rgb-d iNetes. In IEEE Conference on Computer Vision and Pattern Recognition, 2930\u20132937."},{"key":"e_1_3_1_50_2","first-page":"104","volume-title":"European Conference on Computer Vision","author":"Sinha Ayan","year":"2020","unstructured":"Ayan Sinha, Zak Murez, James Bartolozzi, Vijay Badrinarayanan, and Andrew Rabinovich. 2020. Deltas: Depth estimation by learning triangulation and densification of sparse points. In European Conference on Computer Vision, 104\u2013121."},{"key":"e_1_3_1_51_2","first-page":"2348","volume-title":"AAAI Conference on Artificial Intelligence","volume":"37","author":"Su Wanjuan","year":"2023","unstructured":"Wanjuan Su and Wenbing Tao. 2023. Efficient edge-preserving multi-view stereo network for depth estimation. In AAAI Conference on Artificial Intelligence, Vol. 37, 2348\u20132356."},{"key":"e_1_3_1_52_2","first-page":"8606","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Wang Fangjinhua","year":"2022","unstructured":"Fangjinhua Wang, Silvano Galliani, Christoph Vogel, and Marc Pollefeys. 2022. IterMVS: Iterative probability estimation for efficient multi-view stereo. In IEEE Conference on Computer Vision and Pattern Recognition, 8606\u20138615."},{"key":"e_1_3_1_53_2","first-page":"14194","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Wang Fangjinhua","year":"2021","unstructured":"Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Pablo Speciale, and Marc Pollefeys. 2021. Patchmatchnet: Learned multi-view patchmatch stereo. In IEEE Conference on Computer Vision and Pattern Recognition, 14194\u201314203."},{"key":"e_1_3_1_54_2","first-page":"248","volume-title":"International Conference on 3d Vision","author":"Wang Kaixuan","year":"2018","unstructured":"Kaixuan Wang and Shaojie Shen. 2018. Mvdepthnet: Real-time multiview depth estimation neural network. In International Conference on 3d Vision, 248\u2013257."},{"key":"e_1_3_1_55_2","first-page":"2689","volume-title":"AAAI Conference on Artificial Intelligence","author":"Wang Xiaofeng","year":"2023","unstructured":"Xiaofeng Wang, Zheng Zhu, Guan Huang, Xu Chi, Yun Ye, Ziwei Chen, and Xingang Wang. 2023. Crafting monocular cues and velocity guidance for self-supervised multi-frame depth learning. In AAAI Conference on Artificial Intelligence, 2689\u20132697."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00813"},{"key":"e_1_3_1_57_2","first-page":"7494","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Wu Zhenyao","year":"2019","unstructured":"Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, and Lili Ju. 2019. Spatial correspondence with generative adversarial network: Learning depth from monocular videos. In IEEE Conference on Computer Vision and Pattern Recognition, 7494\u20137504."},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58548-8_39"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01596"},{"key":"e_1_3_1_60_2","first-page":"767","volume-title":"European Conference on Computer Vision","author":"Yao Yao","year":"2018","unstructured":"Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. Mvsnet: Depth inference for unstructured multi-view stereo. In European Conference on Computer Vision, 767\u2013783."},{"key":"e_1_3_1_61_2","first-page":"5525","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Yao Yao","year":"2019","unstructured":"Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. 2019. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In IEEE Conference on Computer Vision and Pattern Recognition, 5525\u20135534."},{"key":"e_1_3_1_62_2","first-page":"3916","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Yuan Weihao","year":"2022","unstructured":"Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, and Ping Tan. 2022. Neural window fully-connected crfs for monocular depth estimation. In IEEE Conference on Computer Vision and Pattern Recognition, 3916\u20133925."},{"key":"e_1_3_1_63_2","first-page":"3782","volume-title":"IEEE Winter Conference on Applications of Computer Vision","author":"Zhang Xudong","year":"2021","unstructured":"Xudong Zhang, Yutao Hu, Haochen Wang, Xianbin Cao, and Baochang Zhang. 2021. Long-range attention network for multi-view stereo. In IEEE Winter Conference on Applications of Computer Vision, 3782\u20133791."},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3584703"},{"key":"e_1_3_1_65_2","first-page":"42","volume-title":"2024 International Conference on 3D Vision","author":"Zhu Zihan","year":"2024","unstructured":"Zihan Zhu, Songyou Peng, Viktor Larsson, Zhaopeng Cui, Martin R. Oswald, Andreas Geiger, and Marc Pollefeys. 2024. Nicer-slam: Neural implicit scene encoding for rgb slam. In 2024 International Conference on 3D Vision, 42\u201352."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3731760","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T14:46:47Z","timestamp":1751381207000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3731760"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":64,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3731760"],"URL":"https:\/\/doi.org\/10.1145\/3731760","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2025,6,30]]},"assertion":[{"value":"2024-08-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}