{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T10:26:12Z","timestamp":1780395972377,"version":"3.54.1"},"reference-count":64,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T00:00:00Z","timestamp":1731974400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2024,12,19]]},"abstract":"<jats:p>\n            This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs (i.e., images and videos), a field which has recently been revolutionized by repurposing diffusion priors. However, previous attempts still struggle with stochastic inference, conflicting with the deterministic nature of the Image2Normal task, and costly ensembling step, which slows down the estimation process. Our method, StableNormal, mitigates the stochasticity of the diffusion process by reducing inference variance, thus producing \"Stable-and-Sharp\" normal estimates without any additional ensembling process. StableNormal works robustly under challenging imaging conditions, such as extreme lighting, blurring, and low quality. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarse-to-fine strategy, which starts with a one-step normal estimator (YOSO) to derive an initial normal guess, that is relatively coarse but reliable, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance in standard datasets such as DIODE-indoor, iBims, ScannetV2 and NYUv2, and also in various downstream tasks, such as surface reconstruction and normal enhancement. These results evidence that StableNormal retains both the\n            <jats:italic>\"stability\"<\/jats:italic>\n            and\n            <jats:italic>\"sharpness\"<\/jats:italic>\n            for accurate normal estimation. StableNormal represents a baby attempt to repurpose diffusion priors for\n            <jats:italic>deterministic estimation.<\/jats:italic>\n            To democratize this, code and models have been publicly available in\n            <jats:italic>hf.co\/Stable-X.<\/jats:italic>\n          <\/jats:p>","DOI":"10.1145\/3687971","type":"journal-article","created":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T15:46:04Z","timestamp":1732031164000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":52,"title":["StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal"],"prefix":"10.1145","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7123-0220","authenticated-orcid":false,"given":"Chongjie","family":"Ye","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3250-0486","authenticated-orcid":false,"given":"Lingteng","family":"Qiu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2623-7973","authenticated-orcid":false,"given":"Xiaodong","family":"Gu","sequence":"additional","affiliation":[{"name":"Alibaba, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-2711-9767","authenticated-orcid":false,"given":"Qi","family":"Zuo","sequence":"additional","affiliation":[{"name":"Alibaba, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9725-0606","authenticated-orcid":false,"given":"Yushuang","family":"Wu","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6833-9102","authenticated-orcid":false,"given":"Zilong","family":"Dong","sequence":"additional","affiliation":[{"name":"Alibaba, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3142-0189","authenticated-orcid":false,"given":"Liefeng","family":"Bo","sequence":"additional","affiliation":[{"name":"Alibaba, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0165-5909","authenticated-orcid":false,"given":"Yuliang","family":"Xiu","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Intelligent Systems, Stuttgart, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0162-3296","authenticated-orcid":false,"given":"Xiaoguang","family":"Han","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Shenzhen, Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,11,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/iccv48922.2021.01289"},{"key":"e_1_2_1_2_1","volume-title":"Rethinking Inductive Biases for Surface Normal Estimation. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Bae Gwangbin","unstructured":"Gwangbin Bae and Andrew J. Davison. 2024. Rethinking Inductive Biases for Surface Normal Estimation. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.642"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr.2016.642"},{"key":"e_1_2_1_5_1","volume-title":"Conference on Neural Information Processing Systems (NeurIPS) 35","author":"Bar Amir","year":"2022","unstructured":"Amir Bar, Yossi Gandelsman, Trevor Darrell, Amir Globerson, and Alexei Efros. 2022. Visual prompting via image inpainting. Conference on Neural Information Processing Systems (NeurIPS) 35 (2022), 25005--25017."},{"key":"e_1_2_1_6_1","unstructured":"Manel Baradad Yuanzhen Li Forrester Cole Michael Rubinstein Antonio Torralba William T. Freeman and Varun Jampani. 2023. Background Prompting for Improved Object Depth. arXiv:2306.05428 [cs.CV]"},{"key":"e_1_2_1_7_1","volume-title":"European Conference on Computer Vision. Springer, 552--567","author":"Cao Xu","year":"2022","unstructured":"Xu Cao, Hiroaki Santo, Boxin Shi, Fumio Okura, and Yasuyuki Matsushita. 2022. Bilateral normal integration. In European Conference on Computer Vision. Springer, 552--567."},{"key":"e_1_2_1_8_1","unstructured":"Angela Dai Angel X. Chang Manolis Savva Maciej Halber Thomas Funkhouser and Matthias Nie\u00dfner. 2017. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. arXiv:1702.04405 [cs.CV]"},{"key":"e_1_2_1_9_1","volume-title":"Objaverse: A Universe of Annotated 3D Objects. arXiv preprint arXiv:2212.08051","author":"Deitke Matt","year":"2022","unstructured":"Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli Vander-Bilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. 2022. Objaverse: A Universe of Annotated 3D Objects. arXiv preprint arXiv:2212.08051 (2022)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01061"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.304"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/iccv.2015.304"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 4025--4034","author":"Everaert Martin Nicolas","year":"2024","unstructured":"Martin Nicolas Everaert, Athanasios Fitsios, Marco Bocchio, Sami Arpa, Sabine S\u00fcsstrunk, and Radhakrishna Achanta. 2024. Exploiting the signal-leak bias in diffusion models. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 4025--4034."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision. 3392--3399","author":"Fouhey David F","year":"2013","unstructured":"David F Fouhey, Abhinav Gupta, and Martial Hebert. 2013. Data-driven 3D primitives for single image understanding. In Proceedings of the IEEE International Conference on Computer Vision. 3392--3399."},{"key":"e_1_2_1_15_1","volume-title":"The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=GkJiNn2QDF","author":"Fu Stephanie","unstructured":"Stephanie Fu, Mark Hamilton, Laura E. Brandt, Axel Feldmann, Zhoutong Zhang, and William T. Freeman. 2024a. FeatUp: A Model-Agnostic Framework for Features at Any Resolution. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=GkJiNn2QDF"},{"key":"e_1_2_1_16_1","volume-title":"GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. arxiv","author":"Fu Xiao","year":"2024","unstructured":"Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, and Xiaoxiao Long. 2024b. GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image. arxiv (2024)."},{"key":"e_1_2_1_17_1","volume-title":"Denoising diffusion probabilistic models. Advances in neural information processing systems 33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840--6851."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1073204.1073232"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-006-0031-y"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657428"},{"key":"e_1_2_1_21_1","volume-title":"Large Scale Multi-view Stereopsis Evaluation. 2014 IEEE Conference on Computer Vision and Pattern Recognition","author":"Jensen Rasmus Ramsb\u00f8l","year":"2014","unstructured":"Rasmus Ramsb\u00f8l Jensen, A. Dahl, George Vogiatzis, Engil Tola, and Henrik Aan\u00e6s. 2014. Large Scale Multi-view Stereopsis Evaluation. 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 406--413."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01987"},{"key":"e_1_2_1_23_1","volume-title":"Rodrigo Caye Daudt, and Konrad Schindler","author":"Ke Bingxin","year":"2024","unstructured":"Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, and Konrad Schindler. 2024a. Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation. In Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00907"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Tobias Koch Lukas Liebel Friedrich Fraundorfer and Marco K\u00f6rner. 2018. Evaluation of CNN-based Single-Image Depth Estimation Methods. arXiv:1805.01328 [cs.CV]","DOI":"10.1007\/978-3-030-11015-4_25"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Peter Kocsis Vincent Sitzmann and Matthias Nie\u00dfner. 2024. Intrinsic Image Diffusion for Single-view Material Estimation. In Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR52733.2024.00497"},{"key":"e_1_2_1_27_1","volume-title":"Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv preprint arXiv:1907.01341","author":"Lasinger Katrin","year":"2019","unstructured":"Katrin Lasinger, Ren\u00e9 Ranftl, Konrad Schindler, and Vladlen Koltun. 2019. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. arXiv preprint arXiv:1907.01341 (2019)."},{"key":"e_1_2_1_28_1","volume-title":"International Conference on Computer Vision (ICCV). 2206--2217","author":"Li Alexander C","year":"2023","unstructured":"Alexander C Li, Mihir Prabhudesai, Shivam Duggal, Ellis Brown, and Deepak Pathak. 2023b. Your diffusion model is secretly a zero-shot classifier. In International Conference on Computer Vision (ICCV). 2206--2217."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 3205--3215","author":"Li Yixuan","year":"2023","unstructured":"Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhenzhi Wang, Dahua Lin, and Bo Dai. 2023a. Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 3205--3215."},{"key":"e_1_2_1_30_1","volume-title":"International Conference on Computer Vision (ICCV). 7667--7676","author":"Li Ziyi","year":"2023","unstructured":"Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang, and Weidi Xie. 2023c. Open-vocabulary object segmentation with diffusion models. In International Conference on Computer Vision (ICCV). 7667--7676."},{"key":"e_1_2_1_31_1","volume-title":"Hyperhuman: Hyper-realistic human generation with latent structural diffusion. arXiv preprint arXiv:2310.08579","author":"Liu Xian","year":"2023","unstructured":"Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, and Sergey Tulyakov. 2023. Hyperhuman: Hyper-realistic human generation with latent structural diffusion. arXiv preprint arXiv:2310.08579 (2023)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Xiaoxiao Long Yuan-Chen Guo Cheng Lin Yuan Liu Zhiyang Dou Lingjie Liu Yuexin Ma Song-Hai Zhang Marc Habermann Christian Theobalt et al. 2023. Wonder3d: Single image to 3d using cross-domain diffusion. (2023).","DOI":"10.1109\/CVPR52733.2024.00951"},{"key":"e_1_2_1_33_1","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. arXiv:1711.05101 [cs.LG]"},{"key":"e_1_2_1_34_1","unstructured":"Yuanxun Lu Jingyang Zhang Shiwei Li Tian Fang David McKinnon Yanghai Tsin Long Quan Xun Cao and Yao Yao. 2024. Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion. arXiv:2311.15980 [cs.CV]"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356528"},{"key":"e_1_2_1_36_1","unstructured":"Maxime Oquab Timoth\u00e9e Darcet Th\u00e9o Moutakanni Huy Vo Marc Szafraniec Vasil Khalidov Pierre Fernandez Daniel Haziza Francisco Massa Alaaeldin El-Nouby Mahmoud Assran Nicolas Ballas Wojciech Galuba Russell Howes Po-Yao Huang Shang-Wen Li Ishan Misra Michael Rabbat Vasu Sharma Gabriel Synnaeve Hu Xu Herv\u00e9 Jegou Julien Mairal Patrick Labatut Armand Joulin and Piotr Bojanowski. 2024. DINOv2: Learning Robust Visual Features without Supervision. arXiv:2304.07193 [cs.CV]"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"William Peebles and Saining Xie. 2022. Scalable Diffusion Models with Transformers.","DOI":"10.1109\/ICCV51070.2023.00387"},{"key":"e_1_2_1_38_1","volume-title":"International Conference on Learning Representations (ICLR)","author":"Poole Ben","year":"2023","unstructured":"Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2023. Dreamfusion: Text-to-3d using 2d diffusion. International Conference on Learning Representations (ICLR) (2023)."},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 9914--9925","author":"Qiu Lingteng","year":"2024","unstructured":"Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, and Xiaoguang Han. 2024. Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 9914--9925."},{"key":"e_1_2_1_40_1","volume-title":"International conference on machine learning.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01196"},{"key":"e_1_2_1_42_1","volume-title":"Vision Transformers for Dense Prediction. International Conference on Computer Vision, International Conference on Computer Vision (Jan","author":"Ranftl Rene","year":"2021","unstructured":"Rene Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. 2021b. Vision Transformers for Dense Prediction. International Conference on Computer Vision, International Conference on Computer Vision (Jan 2021)."},{"key":"e_1_2_1_43_1","volume-title":"Nathan Paczan, Russ Webb, and Joshua M. Susskind.","author":"Roberts Mike","year":"2021","unstructured":"Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb, and Joshua M. Susskind. 2021. Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. arXiv:2011.02523 [cs.CV]"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752 [cs.CV]","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2022a. High-resolution image synthesis with latent diffusion models. In Computer Vision and Pattern Recognition (CVPR). 10684--10695.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_1_47_1","series-title":"Lecture Notes in Computer Science (Jan","volume-title":"U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science","author":"Ronneberger Olaf","year":"2015","unstructured":"Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science, Lecture Notes in Computer Science (Jan 2015)."},{"key":"e_1_2_1_48_1","first-page":"25278","article-title":"Laion-5b: An open large-scale dataset for training next generation image-text models","volume":"35","author":"Schuhmann Christoph","year":"2022","unstructured":"Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems 35 (2022), 25278--25294.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_49_1","volume-title":"European Conference on Computer Vision.","author":"Silberman Nathan","year":"2012","unstructured":"Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In European Conference on Computer Vision."},{"key":"e_1_2_1_50_1","volume-title":"Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502","author":"Song Jiaming","year":"2020","unstructured":"Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)."},{"key":"e_1_2_1_51_1","unstructured":"Julian Straub Thomas Whelan Lingni Ma Yufan Chen Erik Wijmans Simon Green Jakob J Engel Raul Mur-Artal Carl Ren Shobhit Verma et al. 2019. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)."},{"key":"e_1_2_1_52_1","volume-title":"Unsupervised Zero-Shot Segmentation using Stable Diffusion. Computer Vision and Pattern Recognition (CVPR)","author":"Tian Junjiao","year":"2024","unstructured":"Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, and Mar Gonzalez-Franco. 2024. Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion. Computer Vision and Pattern Recognition (CVPR) (2024)."},{"key":"e_1_2_1_53_1","volume-title":"DIODE: A Dense Indoor and Outdoor DEpth Dataset. arXiv:1908.00463 [cs.CV]","author":"Vasiljevic Igor","year":"2019","unstructured":"Igor Vasiljevic, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele, Mohammadreza Mostajabi, Steven Basart, Matthew R. Walter, and Gregory Shakhnarovich. 2019. DIODE: A Dense Indoor and Outdoor DEpth Dataset. arXiv:1908.00463 [cs.CV]"},{"key":"e_1_2_1_54_1","volume-title":"Conference on Neural Information Processing Systems (NeurIPS)","author":"Wang Peng","year":"2021","unstructured":"Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. 2021. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. Conference on Neural Information Processing Systems (NeurIPS) (2021)."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr42600.2020.00077"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298652"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr.2015.7298652"},{"key":"e_1_2_1_58_1","volume-title":"Conference on Neural Information Processing Systems (NeurIPS) 36","author":"Wang Zhendong","year":"2023","unstructured":"Zhendong Wang, Yifan Jiang, Yadong Lu, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou, et al. 2023. In-context learning unlocked for diffusion models. Conference on Neural Information Processing Systems (NeurIPS) 36 (2023), 8542--8562."},{"key":"e_1_2_1_59_1","volume-title":"Diffusion Models Trained with Large Data Are Transferable Visual Models. arXiv preprint arXiv:2403.06090","author":"Xu Guangkai","year":"2024","unstructured":"Guangkai Xu, Yongtao Ge, Mingyu Liu, Chengxiang Fan, Kangyang Xie, Zhiyue Zhao, Hao Chen, and Chunhua Shen. 2024. Diffusion Models Trained with Large Data Are Transferable Visual Models. arXiv preprint arXiv:2403.06090 (2024)."},{"key":"e_1_2_1_60_1","volume-title":"IEEE International Conference on Computer Vision (ICCV).","author":"Zhang Lvmin","year":"2023","unstructured":"Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023a. Adding Conditional Control to Text-to-Image Diffusion Models. In IEEE International Conference on Computer Vision (ICCV)."},{"key":"e_1_2_1_61_1","volume-title":"I2vgen-xl: High-quality image-to-video synthesis via cascaded diffusion models. arXiv preprint arXiv:2311.04145","author":"Zhang Shiwei","year":"2023","unstructured":"Shiwei Zhang, Jiayu Wang, Yingya Zhang, Kang Zhao, Hangjie Yuan, Zhiwu Qin, Xiang Wang, Deli Zhao, and Jingren Zhou. 2023b. I2vgen-xl: High-quality image-to-video synthesis via cascaded diffusion models. arXiv preprint arXiv:2311.04145 (2023)."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr.2019.00423"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00527"},{"key":"e_1_2_1_64_1","doi-asserted-by":"crossref","unstructured":"Xin-Yang Zheng Hao Pan Yu-Xiao Guo Xin Tong and Yang Liu. 2024. MVD2: Efficient Multiview 3D Reconstruction for Multiview Diffusion. arXiv:2402.14253 [cs.CV]","DOI":"10.1145\/3641519.3657403"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687971","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3687971","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:09:58Z","timestamp":1750295398000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3687971"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,19]]},"references-count":64,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,19]]}},"alternative-id":["10.1145\/3687971"],"URL":"https:\/\/doi.org\/10.1145\/3687971","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,19]]},"assertion":[{"value":"2024-11-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}