{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T21:20:57Z","timestamp":1764969657307,"version":"3.46.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2022ZD0161600"],"award-info":[{"award-number":["2022ZD0161600"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100031931","name":"Shanghai Artificial Intelligence Laboratory","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100031931","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Hong Kong RGC TRS","award":["T41-603\/20-R"],"award-info":[{"award-number":["T41-603\/20-R"]}]},{"name":"the Centre for Perceptual and Interactive Intelligence (CPII) Ltd under the Innovation and Technology Commission (ITC)?s InnoHK"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>\n                    We present SS4D, a native 4D generative model that synthesizes dynamic 3D objects directly from monocular video. Unlike prior approaches that construct 4D representations by optimizing over 3D or video generative models, we train a generator directly on 4D data, achieving high fidelity, temporal coherence, and structural consistency. At the core of our method is a compressed set of structured spacetime latents. Specifically,\n                    <jats:bold>(1)<\/jats:bold>\n                    To address the scarcity of 4D training data, we build on a pre-trained single-image-to-3D model, preserving strong spatial consistency.\n                    <jats:bold>(2)<\/jats:bold>\n                    Temporal consistency is enforced by introducing dedicated temporal layers that reason across frames.\n                    <jats:bold>(3)<\/jats:bold>\n                    To support efficient training and inference over long video sequences, we compress the latent sequence along the temporal axis using factorized 4D convolutions and temporal downsampling blocks. In addition, we employ a carefully designed training strategy to enhance robustness against occlusion and motion blur, leading to high-quality generation. Extensive experiments show that SS4D produces spatio-temporally consistent 4D objects with superior quality and efficiency, significantly outperforming state-of-the-art methods on both synthetic and real-world datasets.\n                  <\/jats:p>","DOI":"10.1145\/3763302","type":"journal-article","created":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T17:15:39Z","timestamp":1764868539000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["SS4D: Native 4D Generative Model via Structured Spacetime Latents"],"prefix":"10.1145","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-4528-5495","authenticated-orcid":false,"given":"Zhibing","family":"Li","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-0141-3939","authenticated-orcid":false,"given":"Mengchen","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"},{"name":"Shanghai Artificial Intelligence Laboratory, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5557-0623","authenticated-orcid":false,"given":"Tong","family":"Wu","sequence":"additional","affiliation":[{"name":"Stanford University, California, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8016-915X","authenticated-orcid":false,"given":"Jing","family":"Tan","sequence":"additional","affiliation":[{"name":"Chinese University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6877-5353","authenticated-orcid":false,"given":"Jiaqi","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai AI Laboratory, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8865-7896","authenticated-orcid":false,"given":"Dahua","family":"Lin","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong, Hong Kong"},{"name":"Shanghai AI Laboratory, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2025,12,4]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Bahmani Sherwin","year":"2024","unstructured":"Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, and David B. Lindell. 2024. 4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2024)."},{"key":"e_1_2_1_2_1","volume-title":"The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation. arXiv:1905.00737","author":"Caelles Sergi","year":"2019","unstructured":"Sergi Caelles, Jordi Pont-Tuset, Federico Perazzi, Alberto Montes, Kevis-Kokitsi Maninis, and Luc Van Gool. 2019. The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation. arXiv:1905.00737 (2019)."},{"key":"e_1_2_1_3_1","volume-title":"V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video. arXiv preprint arXiv:2503.09631","author":"Chen Jianqi","year":"2025","unstructured":"Jianqi Chen, Biao Zhang, Xiangjun Tang, and Peter Wonka. 2025. V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video. arXiv preprint arXiv:2503.09631 (2025)."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02033"},{"key":"e_1_2_1_5_1","volume-title":"Spconv: Spatially Sparse Convolution Library. https:\/\/github.com\/traveller59\/spconv.","author":"Contributors Spconv","year":"2022","unstructured":"Spconv Contributors. 2022. Spconv: Spatially Sparse Convolution Library. https:\/\/github.com\/traveller59\/spconv."},{"key":"e_1_2_1_6_1","volume-title":"Eli VanderBilt, Aniruddha Kembhavi","author":"Deitke Matt","year":"2023","unstructured":"Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. 2023a. Objaverse-XL: A Universe of 10M+ 3D Objects. arXiv preprint arXiv:2307.05663 (2023)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01263"},{"key":"e_1_2_1_8_1","volume-title":"Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400","author":"Hong Yicong","year":"2023","unstructured":"Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. 2023. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400 (2023)."},{"key":"e_1_2_1_9_1","volume-title":"The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=sPUrdFGepF","author":"Jiang Yanqin","year":"2024","unstructured":"Yanqin Jiang, Li Zhang, Jin Gao, Weiming Hu, and Yao Yao. 2024. Consistent4D: Consistent 360\u00b0 Dynamic Object Generation from Monocular Video. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=sPUrdFGepF"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592433"},{"volume-title":"Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR.","author":"Diederik","key":"e_1_2_1_11_1","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR."},{"key":"e_1_2_1_12_1","volume-title":"Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214","author":"Li Jiahao","year":"2023","unstructured":"Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. 2023. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. arXiv preprint arXiv:2311.06214 (2023)."},{"key":"e_1_2_1_13_1","unstructured":"Yangguang Li Zi-Xin Zou Zexiang Liu Dehu Wang Yuan Liang Zhipeng Yu Xingchao Liu Yuan-Chen Guo Ding Liang Wanli Ouyang et al. 2025. TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models. arXiv preprint arXiv:2502.06608 (2025)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00623"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 300\u2013309","author":"Lin Chen-Hsuan","year":"2023","unstructured":"Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2023. Magic3d: Highresolution text-to-3d content creation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 300\u2013309."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00819"},{"key":"e_1_2_1_17_1","volume-title":"Flow Matching for Generative Modeling. In The Eleventh International Conference on Learning Representations, ICLR 2023","author":"Lipman Yaron","year":"2023","unstructured":"Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. 2023. Flow Matching for Generative Modeling. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1\u20135, 2023."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Ben Mildenhall Pratul P. Srinivasan Matthew Tancik Jonathan T. Barron Ravi Ramamoorthi and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.","DOI":"10.1007\/978-3-030-58452-8_24"},{"key":"e_1_2_1_20_1","unstructured":"Maxime Oquab Timoth\u00e9e Darcet Th\u00e9o Moutakanni Huy Vo Marc Szafraniec Vasil Khalidov Pierre Fernandez Daniel Haziza Francisco Massa Alaaeldin El-Nouby et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00387"},{"key":"e_1_2_1_22_1","volume-title":"DreamFusion: Text-to-3D using 2D Diffusion. arXiv","author":"Poole Ben","year":"2022","unstructured":"Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. 2022. DreamFusion: Text-to-3D using 2D Diffusion. arXiv (2022)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00946"},{"key":"e_1_2_1_24_1","volume-title":"Dreamgaussian4d: Generative 4d gaussian splatting. arXiv preprint arXiv:2312.17142","author":"Ren Jiawei","year":"2023","unstructured":"Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, and Ziwei Liu. 2023. Dreamgaussian4d: Generative 4d gaussian splatting. arXiv preprint arXiv:2312.17142 (2023)."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of Neural Information Processing Systems(NeurIPS).","author":"Ren Jiawei","year":"2024","unstructured":"Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, and Huan Ling. 2024. L4GM: Large 4D Gaussian Reconstruction Model. In Proceedings of Neural Information Processing Systems(NeurIPS)."},{"key":"e_1_2_1_26_1","volume-title":"Einops: Clear and Reliable Tensor Manipulations with Einsteinlike Notation. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=oapKSVM2bcj","author":"Rogozhnikov Alex","year":"2022","unstructured":"Alex Rogozhnikov. 2022. Einops: Clear and Reliable Tensor Manipulations with Einsteinlike Notation. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=oapKSVM2bcj"},{"key":"e_1_2_1_27_1","volume-title":"Puppeteer: Rig and Animate Your 3D Models. arXiv preprint arXiv:2508.10898","author":"Song Chaoyue","year":"2025","unstructured":"Chaoyue Song, Xiu Li, Fan Yang, Zhongcong Xu, Jiacheng Wei, Fayao Liu, Jiashi Feng, Guosheng Lin, and Jianfeng Zhang. 2025. Puppeteer: Rig and Animate Your 3D Models. arXiv preprint arXiv:2508.10898 (2025)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.127063"},{"key":"e_1_2_1_29_1","volume-title":"Shengming Yin, Wengang Zhou, Jing Liao, and Houqiang Li.","author":"Sun Qi","year":"2024","unstructured":"Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, and Houqiang Li. 2024. Eg4d: Explicit generation of 4d object without score distillation. arXiv preprint arXiv:2405.18132 (2024)."},{"key":"e_1_2_1_30_1","volume-title":"European Conference on Computer Vision. Springer, 1\u201318","author":"Tang Jiaxiang","year":"2024","unstructured":"Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. 2024. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. In European Conference on Computer Vision. Springer, 1\u201318."},{"key":"e_1_2_1_31_1","unstructured":"Tencent Hunyuan3D Team. 2025. Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation. arXiv:2501.12202 [cs.CV]"},{"key":"e_1_2_1_32_1","volume-title":"Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly.","author":"Unterthiner Thomas","year":"2018","unstructured":"Thomas Unterthiner, Sjoerd Van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, and Sylvain Gelly. 2018. Towards accurate generative models of video: A new metric & challenges. arXiv preprint arXiv:1812.01717 (2018)."},{"key":"e_1_2_1_33_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073608"},{"key":"e_1_2_1_35_1","volume-title":"Image quality assessment: from error visibility to structural similarity","author":"Wang Zhou","year":"2004","unstructured":"Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600\u2013612."},{"key":"e_1_2_1_36_1","volume-title":"Cat4d: Create anything in 4d with multi-view video diffusion models. arXiv preprint arXiv:2411.18613","author":"Wu Rundi","year":"2024","unstructured":"Rundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng, Jonathan T Barron, and Aleksander Holynski. 2024. Cat4d: Create anything in 4d with multi-view video diffusion models. arXiv preprint arXiv:2411.18613 (2024)."},{"key":"e_1_2_1_37_1","volume-title":"AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation. arXiv preprint arxiv:2506.09982","author":"Wu Zijie","year":"2025","unstructured":"Zijie Wu, Chaohui Yu, Fan Wang, and Xiang. Bai. 2025. AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation. arXiv preprint arxiv:2506.09982 (2025)."},{"key":"e_1_2_1_38_1","volume-title":"Structured 3D Latents for Scalable and Versatile 3D Generation. arXiv preprint arXiv:2412.01506","author":"Xiang Jianfeng","year":"2024","unstructured":"Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. 2024. Structured 3D Latents for Scalable and Versatile 3D Generation. arXiv preprint arXiv:2412.01506 (2024)."},{"key":"e_1_2_1_39_1","volume-title":"SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency. arXiv preprint arXiv:2407.17470","author":"Xie Yiming","year":"2024","unstructured":"Yiming Xie, Chun-Han Yao, Vikram Voleti, Huaizu Jiang, and Varun Jampani. 2024. SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency. arXiv preprint arXiv:2407.17470 (2024)."},{"key":"e_1_2_1_40_1","volume-title":"Diffusion2: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models. arXiv preprint arXiv:2404.02148","author":"Yang Zeyu","year":"2024","unstructured":"Zeyu Yang, Zijie Pan, Chun Gu, and Li Zhang. 2024a. Diffusion2: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models. arXiv preprint arXiv:2404.02148 (2024)."},{"key":"e_1_2_1_41_1","unstructured":"Zhuoyi Yang Jiayan Teng Wendi Zheng Ming Ding Shiyu Huang Jiazheng Xu Yuanming Yang Wenyi Hong Xiaohan Zhang Guanyu Feng et al. 2024b. CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer. arXiv preprint arXiv:2408.06072 (2024)."},{"key":"e_1_2_1_42_1","volume-title":"Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation. arXiv preprint arXiv:2503.16396","author":"Yao Chun-Han","year":"2025","unstructured":"Chun-Han Yao, Yiming Xie, Vikram Voleti, Huaizu Jiang, and Varun Jampani. 2025. SV4D2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation. arXiv preprint arXiv:2503.16396 (2025)."},{"key":"e_1_2_1_43_1","first-page":"45256","article-title":"4real: Towards photorealistic 4d scene generation via video diffusion models","volume":"37","author":"Yu Heng","year":"2024","unstructured":"Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, L\u00e1szl\u00f3 Jeni, Sergey Tulyakov, and Hsin-Ying Lee. 2024. 4real: Towards photorealistic 4d scene generation via video diffusion models. Advances in Neural Information Processing Systems 37 (2024), 45256\u201345280.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Yifei Zeng Yanqin Jiang Siyu Zhu Yuanxun Lu Youtian Lin Hao Zhu Weiming Hu Xun Cao and Yao Yao. 2024. STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians. (2024). arXiv:2403.14939 [cs.CV]","DOI":"10.1007\/978-3-031-72764-1_10"},{"key":"e_1_2_1_45_1","volume-title":"Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis. arXiv preprint arXiv:2507.23785","author":"Zhang Bowen","year":"2025","unstructured":"Bowen Zhang, Sicheng Xu, Chuxin Wang, Jiaolong Yang, Feng Zhao, Dong Chen, and Baining Guo. 2025. Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis. arXiv preprint arXiv:2507.23785 (2025)."},{"key":"e_1_2_1_46_1","volume-title":"4Diffusion: Multi-view Video Diffusion Model for 4D Generation. arXiv preprint arXiv:2405.20674","author":"Zhang Haiyu","year":"2024","unstructured":"Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu, Yunhong Wang, and Yu Qiao. 2024b. 4Diffusion: Multi-view Video Diffusion Model for 4D Generation. arXiv preprint arXiv:2405.20674 (2024)."},{"key":"e_1_2_1_47_1","volume-title":"European Conference on Computer Vision","author":"Zhang Kai","year":"2024","unstructured":"Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. 2024a. GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting. European Conference on Computer Vision (2024)."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3658146"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_2_1_50_1","volume-title":"Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation. Advances in neural information processing systems 36","author":"Zhao Zibo","year":"2023","unstructured":"Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, and Shenghua Gao. 2023. Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation. Advances in neural information processing systems 36 (2023), 73969\u201373982."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00697"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3763302","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T21:16:15Z","timestamp":1764969375000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3763302"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12]]},"references-count":51,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1145\/3763302"],"URL":"https:\/\/doi.org\/10.1145\/3763302","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"type":"print","value":"0730-0301"},{"type":"electronic","value":"1557-7368"}],"subject":[],"published":{"date-parts":[[2025,12]]},"assertion":[{"value":"2025-05-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}