{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T09:30:49Z","timestamp":1780392649153,"version":"3.54.1"},"reference-count":149,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2022YFF0902301"],"award-info":[{"award-number":["2022YFF0902301"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"name":"NSFC programs","award":["61976138"],"award-info":[{"award-number":["61976138"]}]},{"name":"NSFC programs","award":["61977047"],"award-info":[{"award-number":["61977047"]}]},{"DOI":"10.13039\/501100003399","name":"STCSM","doi-asserted-by":"crossref","award":["2015F0203-000-06"],"award-info":[{"award-number":["2015F0203-000-06"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"crossref"}]},{"name":"SHMEC","award":["2019-01-07-00-01-E00003"],"award-info":[{"award-number":["2019-01-07-00-01-E00003"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2025,8,1]]},"abstract":"<jats:p>3D creation has always been a unique human strength, driven by our ability to deconstruct and reassemble objects using our eyes, mind and hand. However, current 3D design tools struggle to replicate this natural process, requiring considerable artistic expertise and manual labor. This paper introduces BANG, a novel generative approach that bridges 3D generation and reasoning, allowing for intuitive and flexible part-level decomposition of 3D objects. At the heart of BANG is \"Generative Exploded Dynamics\", which creates a smooth sequence of exploded states for an input geometry, progressively separating parts while preserving their geometric and semantic coherence. BANG utilizes a pre-trained large-scale latent diffusion model, fine-tuned for exploded dynamics with a lightweight exploded view adapter, allowing precise control over the decomposition process. It also incorporates a temporal attention module to ensure smooth transitions and consistency across time. BANG enhances control with spatial prompts, such as bounding boxes and surface regions, enabling users to specify which parts to decompose and how. This interaction can be extended with multimodal models like GPT-4, enabling 2D-to-3D manipulations for more intuitive and creative workflows. The capabilities of BANG extend to generating detailed part-level geometry, associating parts with functional descriptions, and facilitating component-aware 3D creation and manufacturing workflows. Additionally, BANG offers applications in 3D printing, where separable parts are generated for easy printing and reassembly. In essence, BANG enables seamless transformation from imaginative concepts to detailed 3D assets, offering a new perspective on creation that resonates with human intuition.<\/jats:p>","DOI":"10.1145\/3730840","type":"journal-article","created":{"date-parts":[[2025,7,27]],"date-time":"2025-07-27T04:02:41Z","timestamp":1753588961000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["BANG: Dividing 3D Assets via Generative Exploded Dynamics"],"prefix":"10.1145","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8508-3359","authenticated-orcid":false,"given":"Longwen","family":"Zhang","sequence":"first","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"},{"name":"Deemos Technology, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4837-7152","authenticated-orcid":false,"given":"Qixuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"},{"name":"Deemos Technology, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-9673-8545","authenticated-orcid":false,"given":"Haoran","family":"Jiang","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"},{"name":"Deemos Technology, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3434-212X","authenticated-orcid":false,"given":"Yinuo","family":"Bai","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"},{"name":"Deemos Technology, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1189-1254","authenticated-orcid":false,"given":"Wei","family":"Yang","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8807-7787","authenticated-orcid":false,"given":"Lan","family":"Xu","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9198-6853","authenticated-orcid":false,"given":"Jingyi","family":"Yu","sequence":"additional","affiliation":[{"name":"ShanghaiTech University, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,7,27]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01392"},{"key":"e_1_2_2_2_1","volume-title":"Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al.","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)."},{"key":"e_1_2_2_3_1","volume-title":"Gencad: Image-conditioned computer-aided design generation with transformer-based contrastive representation and diffusion priors. arXiv preprint arXiv:2409.16294","author":"Alam Md Ferdous","year":"2024","unstructured":"Md Ferdous Alam and Faez Ahmed. 2024. Gencad: Image-conditioned computer-aided design generation with transformer-based contrastive representation and diffusion priors. arXiv preprint arXiv:2409.16294 (2024)."},{"key":"e_1_2_2_4_1","volume-title":"Sai Sravan Yarlagadda, and Amir Barati Farimani","author":"Badagabettu Akshay","year":"2024","unstructured":"Akshay Badagabettu, Sai Sravan Yarlagadda, and Amir Barati Farimani. 2024. Query2CAD: Generating CAD models using natural language queries. arXiv preprint arXiv:2406.00144 (2024)."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00764"},{"key":"e_1_2_2_6_1","unstructured":"Bambu Lab. 2022. X1 Carbon. https:\/\/bambulab.com\/en\/x1."},{"key":"e_1_2_2_7_1","unstructured":"Black Forest Labs. 2023. FLUX. https:\/\/github.com\/black-forest-labs\/flux."},{"key":"e_1_2_2_8_1","unstructured":"Blender Foundation. Ongoing. Blender - A Free and Open Source 3D Creation Suite. https:\/\/www.blender.org\/. Accessed: 2025-01-20."},{"key":"e_1_2_2_9_1","volume-title":"Exploded views for","author":"Bruckner Stefan","year":"2006","unstructured":"Stefan Bruckner and M Eduard Groller. 2006. Exploded views for volume data. IEEE transactions on visualization and computer graphics 12, 5 (2006), 1077\u20131084."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01937"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"e_1_2_2_12_1","unstructured":"Jiazhong Cen Zanwei Zhou Jiemin Fang Chen Yang Wei Shen Lingxi Xie Dongsheng Jiang Xiaopeng Zhang and Qi Tian. 2023. Segment Anything in 3D with NeRFs. In NeurIPS."},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00389"},{"key":"e_1_2_2_14_1","volume-title":"Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012","author":"Chang Angel X","year":"2015","unstructured":"Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. 2015. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)."},{"key":"e_1_2_2_15_1","volume-title":"PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models. arXiv preprint arXiv:2412.18608","author":"Chen Minghao","year":"2024","unstructured":"Minghao Chen, Roman Shapovalov, Iro Laina, Tom Monnier, Jianyuan Wang, David Novotny, and Andrea Vedaldi. 2024c. PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models. arXiv preprint arXiv:2412.18608 (2024)."},{"key":"e_1_2_2_16_1","volume-title":"The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=Gcks157FI3","author":"Chen Sijin","year":"2024","unstructured":"Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Zhibin Wang, Jingyi Yu, Gang Yu, BIN FU, and Tao Chen. 2024a. MeshXL: Neural Coordinate Field for Generative 3D Foundation Models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=Gcks157FI3"},{"key":"e_1_2_2_17_1","unstructured":"Yiwen Chen Tong He Di Huang Weicai Ye Sijin Chen Jiaxiang Tang Xin Chen Zhongang Cai Lei Yang Gang Yu et al. 2024b. MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers. arXiv preprint arXiv:2406.10163 (2024)."},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72691-0_8"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2408.02555"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02022"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2403.06738"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW60793.2023.00314"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01263"},{"key":"e_1_2_2_24_1","volume-title":"DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow. arXiv preprint arXiv:2411.16820","author":"Deng Ken","year":"2024","unstructured":"Ken Deng, Yuanchen Guo, Jingxiang Sun, Zixin Zou, Yangguang Li, Xin Cai, Yanpei Cao, Yebin Liu, and Ding Liang. 2024. DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow. arXiv preprint arXiv:2411.16820 (2024)."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-73030-6_2"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/3692070.3692570"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01315"},{"key":"e_1_2_2_28_1","volume-title":"Eslam Mohamed Bakr, and Mohamed Elhoseiny","author":"Fei Junjie","year":"2024","unstructured":"Junjie Fei, Mahmoud Ahmed, Jian Ding, Eslam Mohamed Bakr, and Mohamed Elhoseiny. 2024. Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding. arXiv preprint arXiv:2405.18937 (2024)."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356488"},{"key":"e_1_2_2_30_1","volume-title":"The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=TFZlFRl9Ks","author":"Gao Ruiqi","year":"2024","unstructured":"Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin Brualla, Pratul P. Srinivasan, Jonathan T. Barron, and Ben Poole. 2024. CAT3D: Create Anything in 3D with Multi-View Diffusion Models. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=TFZlFRl9Ks"},{"key":"e_1_2_2_31_1","volume-title":"International Conference on Machine Learning. PMLR, 11808\u201311826","author":"Gu Jiatao","year":"2023","unstructured":"Jiatao Gu, Alex Trevithick, Kai-En Lin, Joshua M Susskind, Christian Theobalt, Lingjie Liu, and Ravi Ramamoorthi. 2023. Nerfdiff: Single-image view synthesis with nerfguided distillation from 3d-aware diffusion. In International Conference on Machine Learning. PMLR, 11808\u201311826."},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2835487"},{"key":"e_1_2_2_33_1","volume-title":"REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment. arXiv preprint arXiv:2405.18525","author":"Han Haonan","year":"2024","unstructured":"Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, and Wanhua Li. 2024. REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment. arXiv preprint arXiv:2405.18525 (2024)."},{"key":"e_1_2_2_34_1","volume-title":"Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale. arXiv preprint arXiv:2412.09548","author":"Hao Zekun","year":"2024","unstructured":"Zekun Hao, David W Romero, Tsung-Yi Lin, and Ming-Yu Liu. 2024. Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale. arXiv preprint arXiv:2412.09548 (2024)."},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530084"},{"key":"e_1_2_2_36_1","first-page":"20482","article-title":"3d-llm: Injecting the 3d world into large language models","volume":"36","author":"Hong Yining","year":"2023","unstructured":"Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, and Chuang Gan. 2023. 3d-llm: Injecting the 3d world into large language models. Advances in Neural Information Processing Systems 36 (2023), 20482\u201320494.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_37_1","volume-title":"The Twelfth International Conference on Learning Representations.","author":"Huang Yukun","year":"2023","unstructured":"Yukun Huang, Jianan Wang, Yukai Shi, Boshi Tang, Xianbiao Qi, and Lei Zhang. 2023. Dreamtime: An improved optimization strategy for diffusion-guided 3d generation. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2023.XIX.066"},{"key":"e_1_2_2_39_1","volume-title":"The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=sPUrdFGepF","author":"Jiang Yanqin","year":"2024","unstructured":"Yanqin Jiang, Li Zhang, Jin Gao, Weiming Hu, and Yao Yao. 2024. Consistent4D: Consistent 360\u00b0 Dynamic Object Generation from Monocular Video. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=sPUrdFGepF"},{"key":"e_1_2_2_40_1","volume-title":"Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463","author":"Jun Heewoo","year":"2023","unstructured":"Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463 (2023)."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2010.151"},{"key":"e_1_2_2_42_1","volume-title":"The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=5k9XeHIK3L","author":"Khan Mohammad Sadil","year":"2024","unstructured":"Mohammad Sadil Khan, Sankalp Sinha, Sheikh Talha Uddin, Didier Stricker, Sk Aziz Ali, and Muhammad Zeshan Afzal. 2024. Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=5k9XeHIK3L"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01328"},{"key":"e_1_2_2_45_1","first-page":"1","article-title":"Free2cad: Parsing freehand drawings into cad commands","volume":"41","author":"Li Changjian","year":"2022","unstructured":"Changjian Li, Hao Pan, Adrien Bousseau, and Niloy J Mitra. 2022a. Free2cad: Parsing freehand drawings into cad commands. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1\u201316.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_2_46_1","unstructured":"Jiahao Li Hao Tan Kai Zhang Zexiang Xu Fujun Luan Yinghao Xu Yicong Hong Kalyan Sunkavalli Greg Shakhnarovich and Sai Bi. 2024c. Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model. In ICLR. https:\/\/openreview.net\/forum?id=2lDQLiH1W4"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01069"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360700"},{"key":"e_1_2_2_49_1","volume-title":"Proceedings of Graphics Interface","author":"Li Wilmot","year":"2004","unstructured":"Wilmot Li, Maneesh Agrawala, and David Salesin. 2004. Interactive image-based exploded view diagrams. In Proceedings of Graphics Interface 2004. 203\u2013212."},{"key":"e_1_2_2_50_1","volume-title":"CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner. arXiv preprint arXiv:2405.14979","author":"Li Weiyu","year":"2024","unstructured":"Weiyu Li, Jiarui Liu, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, and Xiaoxiao Long. 2024b. CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner. arXiv preprint arXiv:2405.14979 (2024)."},{"key":"e_1_2_2_51_1","volume-title":"Proceedings of the 32nd International Conference on Neural Information Processing Systems","author":"Li Yangyan","year":"2018","unstructured":"Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. 2018. PointCNN: convolution on X-transformed points. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montr\u00e9al, Canada) (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 828\u2013838."},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i4.28113"},{"key":"e_1_2_2_53_1","volume-title":"Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models. arXiv preprint arXiv:2405.16645","author":"Liang Hanwen","year":"2024","unstructured":"Hanwen Liang, Yuyang Yin, Dejia Xu, Hanxue Liang, Zhangyang Wang, Konstantinos N Plataniotis, Yao Zhao, and Yunchao Wei. 2024. Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models. arXiv preprint arXiv:2405.16645 (2024)."},{"key":"e_1_2_2_54_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 300\u2013309","author":"Lin Chen-Hsuan","year":"2023","unstructured":"Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2023. Magic3d: Highresolution text-to-3d content creation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 300\u2013309."},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657482"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00960"},{"key":"e_1_2_2_57_1","volume-title":"Openshape: Scaling up 3d shape representation towards open-world understanding. Advances in neural information processing systems 36","author":"Liu Minghua","year":"2024","unstructured":"Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xuanlin Li, Shizhong Han, Hong Cai, Fatih Porikli, and Hao Su. 2024d. Openshape: Scaling up 3d shape representation towards open-world understanding. Advances in neural information processing systems 36 (2024)."},{"key":"e_1_2_2_58_1","volume-title":"One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. Advances in Neural Information Processing Systems 36","author":"Liu Minghua","year":"2024","unstructured":"Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. 2024e. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02082"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00853"},{"key":"e_1_2_2_61_1","volume-title":"Syncdreamer: Generating multiview-consistent images from a single-view image. In ICLR.","author":"Liu Yuan","year":"2024","unstructured":"Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. 2024b. Syncdreamer: Generating multiview-consistent images from a single-view image. In ICLR."},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00951"},{"key":"e_1_2_2_63_1","volume-title":"Scalable 3d captioning with pretrained models. Advances in Neural Information Processing Systems 36","author":"Luo Tiange","year":"2024","unstructured":"Tiange Luo, Chris Rockwell, Honglak Lee, and Justin Johnson. 2024. Scalable 3d captioning with pretrained models. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_2_64_1","volume-title":"Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, et al.","author":"Ma Xianzheng","year":"2024","unstructured":"Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, et al. 2024. When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models. arXiv preprint arXiv:2405.10255 (2024)."},{"key":"e_1_2_2_65_1","volume-title":"Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=3Pbra-_u76D","author":"Ma Xu","year":"2022","unstructured":"Xu Ma, Can Qin, Haoxuan You, Haoxi Ran, and Yun Fu. 2022. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=3Pbra-_u76D"},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00816"},{"key":"e_1_2_2_67_1","volume-title":"Siggraph Asia 2019 38, 6","author":"Mo Kaichun","year":"2019","unstructured":"Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, and Leonidas Guibas. 2019a. StructureNet: Hierarchical Graph Networks for 3D Shape Generation. ACM Transactions on Graphics (TOG), Siggraph Asia 2019 38, 6 (2019), Article 242."},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00100"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01311"},{"key":"e_1_2_2_70_1","volume-title":"International conference on machine learning. PMLR, 7220\u20137229","author":"Nash Charlie","year":"2020","unstructured":"Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter Battaglia. 2020. Polygen: An autoregressive generative model of 3d meshes. In International conference on machine learning. PMLR, 7220\u20137229."},{"key":"e_1_2_2_71_1","volume-title":"Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751","author":"Nichol Alex","year":"2022","unstructured":"Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. 2022. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751 (2022)."},{"key":"e_1_2_2_72_1","unstructured":"OpenAI. 2023. DALL-E 3. https:\/\/openai.com\/index\/dall-e-3\/."},{"key":"e_1_2_2_73_1","volume-title":"DINOv2: Learning Robust Visual Features without Supervision. Transactions on Machine Learning Research","author":"Oquab Maxime","year":"2024","unstructured":"Maxime Oquab, Timoth\u00e9e Darcet, Th\u00e9o Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, and Piotr Bojanowski. 2024. DINOv2: Learning Robust Visual Features without Supervision. Transactions on Machine Learning Research (2024). https:\/\/openreview.net\/forum?id=a68SUt6zFt Featured Certification."},{"key":"e_1_2_2_74_1","volume-title":"Fast dynamic 3d object generation from a single-view video. arXiv preprint arXiv:2401.08742","author":"Pan Zijie","year":"2024","unstructured":"Zijie Pan, Zeyu Yang, Xiatian Zhu, and Li Zhang. 2024. Fast dynamic 3d object generation from a single-view video. arXiv preprint arXiv:2401.08742 (2024)."},{"key":"e_1_2_2_75_1","volume-title":"Anise: Assembly-based neural implicit surface reconstruction","author":"Petrov Dmitry","year":"2023","unstructured":"Dmitry Petrov, Matheus Gadelha, Radom\u00edr M\u011bch, and Evangelos Kalogerakis. 2023. Anise: Assembly-based neural implicit surface reconstruction. IEEE Transactions on Visualization and Computer Graphics (2023)."},{"key":"e_1_2_2_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/3DV62453.2024.00026"},{"key":"e_1_2_2_77_1","volume-title":"The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=FjNys5c7VyY","author":"Poole Ben","year":"2023","unstructured":"Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. 2023. DreamFusion: Text-to-3D using 2D Diffusion. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=FjNys5c7VyY"},{"key":"e_1_2_2_78_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 652\u2013660","author":"Qi Charles R","year":"2017","unstructured":"Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017a. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652\u2013660."},{"key":"e_1_2_2_79_1","volume-title":"Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems 30","author":"Qi Charles Ruizhongtai","year":"2017","unstructured":"Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017b. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_2_80_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72775-7_13"},{"key":"e_1_2_2_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02495"},{"key":"e_1_2_2_82_1","volume-title":"GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models. arXiv preprint arXiv:2501.01428","author":"Qi Zhangyang","year":"2025","unstructured":"Zhangyang Qi, Zhixiong Zhang, Ye Fang, Jiaqi Wang, and Hengshuang Zhao. 2025b. GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models. arXiv preprint arXiv:2501.01428 (2025)."},{"key":"e_1_2_2_83_1","volume-title":"Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Advances in neural information processing systems 35","author":"Qian Guocheng","year":"2022","unstructured":"Guocheng Qian, Yuchen Li, Houwen Peng, Jinjie Mai, Hasan Hammoud, Mohamed Elhoseiny, and Bernard Ghanem. 2022. Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Advances in neural information processing systems 35 (2022), 23192\u201323204."},{"key":"e_1_2_2_84_1","volume-title":"The Twelfth International Conference on Learning Representations (ICLR). https:\/\/openreview.net\/forum?id=0jHkUDyEO9","author":"Qian Guocheng","year":"2024","unstructured":"Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, and Bernard Ghanem. 2024. Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors. In The Twelfth International Conference on Learning Representations (ICLR). https:\/\/openreview.net\/forum?id=0jHkUDyEO9"},{"key":"e_1_2_2_85_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00946"},{"key":"e_1_2_2_86_1","volume-title":"International conference on machine learning. PMLR, 8748\u20138763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748\u20138763."},{"key":"e_1_2_2_87_1","volume-title":"Bringing Objects to Life: 4D generation from 3D objects. arXiv preprint arXiv:2412.20422","author":"Rahamim Ohad","year":"2024","unstructured":"Ohad Rahamim, Ori Malca, Dvir Samuel, and Gal Chechik. 2024. Bringing Objects to Life: 4D generation from 3D objects. arXiv preprint arXiv:2412.20422 (2024)."},{"key":"e_1_2_2_88_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00223"},{"key":"e_1_2_2_89_1","unstructured":"Nikhila Ravi Valentin Gabeur Yuan-Ting Hu Ronghang Hu Chaitanya Ryali Tengyu Ma Haitham Khedr Roman R\u00e4dle Chloe Rolland Laura Gustafson et al. 2024. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 (2024)."},{"key":"e_1_2_2_90_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2312.17142"},{"key":"e_1_2_2_91_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00403"},{"key":"e_1_2_2_92_1","doi-asserted-by":"publisher","DOI":"10.21062\/mft.2021.067"},{"key":"e_1_2_2_93_1","volume-title":"a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110","author":"Shi Ruoxi","year":"2023","unstructured":"Ruoxi Shi, Hansheng Chen, Zhuoyang Zhang, Minghua Liu, Chao Xu, Xinyue Wei, Linghao Chen, Chong Zeng, and Hao Su. 2023. Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110 (2023)."},{"key":"e_1_2_2_94_1","volume-title":"The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=FUgrjq2pbB","author":"Shi Yichun","year":"2024","unstructured":"Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, and Xiao Yang. 2024. MV-Dream: Multi-view Diffusion for 3D Generation. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=FUgrjq2pbB"},{"key":"e_1_2_2_95_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01855"},{"key":"e_1_2_2_96_1","doi-asserted-by":"publisher","DOI":"10.5555\/3618408.3619731"},{"key":"e_1_2_2_97_1","unstructured":"Ay\u00e7a Takmaz Elisabetta Fedele Robert W. Sumner Marc Pollefeys Federico Tombari and Francis Engelmann. 2023. OpenMask3D: Open-Vocabulary 3D Instance Segmentation. In Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_2_2_98_1","volume-title":"Segment any mesh: Zero-shot mesh part segmentation via lifting segment anything 2 to 3d. arXiv preprint arXiv:2408.13679","author":"Tang George","year":"2024","unstructured":"George Tang, William Zhao, Logan Ford, David Benhaim, and Paul Zhang. 2024c. Segment any mesh: Zero-shot mesh part segmentation via lifting segment anything 2 to 3d. arXiv preprint arXiv:2408.13679 (2024)."},{"key":"e_1_2_2_99_1","volume-title":"Edgerunner: Auto-regressive auto-encoder for artistic mesh generation. arXiv preprint arXiv:2409.18114","author":"Tang Jiaxiang","year":"2024","unstructured":"Jiaxiang Tang, Zhaoshuo Li, Zekun Hao, Xian Liu, Gang Zeng, Ming-Yu Liu, and Qinsheng Zhang. 2024b. Edgerunner: Auto-regressive auto-encoder for artistic mesh generation. arXiv preprint arXiv:2409.18114 (2024)."},{"key":"e_1_2_2_100_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02086"},{"key":"e_1_2_2_101_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72640-8_10"},{"key":"e_1_2_2_102_1","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3681257"},{"key":"e_1_2_2_103_1","volume-title":"European Conference on Computer Vision. Springer, 149\u2013166","author":"Thai Anh","year":"2025","unstructured":"Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, James M Rehg, and Matt Feiszli. 2025. 3x2: 3D Object Part Segmentation by 2D Semantic Correspondences. In European Conference on Computer Vision. Springer, 149\u2013166."},{"key":"e_1_2_2_104_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00333"},{"key":"e_1_2_2_105_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01155"},{"key":"e_1_2_2_106_1","volume-title":"Cg3d: Compositional generation for text-to-3d via gaussian splatting. arXiv preprint arXiv:2311.17907","author":"Vilesov Alexander","year":"2023","unstructured":"Alexander Vilesov, Pradyumna Chari, and Achuta Kadambi. 2023. Cg3d: Compositional generation for text-to-3d via gaussian splatting. arXiv preprint arXiv:2311.17907 (2023)."},{"key":"e_1_2_2_107_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01214"},{"key":"e_1_2_2_108_1","volume-title":"Luciddreaming: Controllable object-centric 3d generation. arXiv preprint arXiv:2312.00588","author":"Wang Zhaoning","year":"2023","unstructured":"Zhaoning Wang, Ming Li, and Chen Chen. 2023b. Luciddreaming: Controllable object-centric 3d generation. arXiv preprint arXiv:2312.00588 (2023)."},{"key":"e_1_2_2_109_1","volume-title":"Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in Neural Information Processing Systems 36","author":"Wang Zhengyi","year":"2024","unstructured":"Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. 2024. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_2_110_1","volume-title":"Novel View Synthesis with Diffusion Models. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=HtoA0oT30jC","author":"Watson Daniel","year":"2023","unstructured":"Daniel Watson, William Chan, Ricardo Martin Brualla, Jonathan Ho, Andrea Tagliasacchi, and Mohammad Norouzi. 2023. Novel View Synthesis with Diffusion Models. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=HtoA0oT30jC"},{"key":"e_1_2_2_111_1","volume-title":"The Thirteenth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=WAC8LmlKYf","author":"Weng Haohan","year":"2025","unstructured":"Haohan Weng, Yikai Wang, Tong Zhang, C. L. Philip Chen, and Jun Zhu. 2025. PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance. In The Thirteenth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=WAC8LmlKYf"},{"key":"e_1_2_2_112_1","doi-asserted-by":"crossref","unstructured":"Haohan Weng Zibo Zhao Biwen Lei Xianghui Yang Jian Liu Zeqiang Lai Zhuo Chen Yuhong Liu Jie Jiang Chunchao Guo et al. 2024. Scaling Mesh Generation via Compressive Tokenization. arXiv preprint arXiv:2411.07025 (2024).","DOI":"10.1109\/CVPR52734.2025.01036"},{"key":"e_1_2_2_113_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00091"},{"key":"e_1_2_2_114_1","volume-title":"The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=vCOgjBIZuL","author":"Wu Shuang","year":"2024","unstructured":"Shuang Wu, Youtian Lin, Yifei Zeng, Feihu Zhang, Jingxi Xu, Philip Torr, Xun Cao, and Yao Yao. 2024b. Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer. In The Thirty-eighth Annual Conference on Neural Information Processing Systems. https:\/\/openreview.net\/forum?id=vCOgjBIZuL"},{"key":"e_1_2_2_115_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00463"},{"key":"e_1_2_2_116_1","first-page":"33330","article-title":"Point transformer v2: Grouped vector attention and partition-based pooling","volume":"35","author":"Wu Xiaoyang","year":"2022","unstructured":"Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Hengshuang Zhao. 2022. Point transformer v2: Grouped vector attention and partition-based pooling. Advances in Neural Information Processing Systems 35 (2022), 33330\u201333342.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_117_1","first-page":"1","article-title":"Sagnet: Structure-aware generative network for 3d-shape modeling","volume":"38","author":"Wu Zhijie","year":"2019","unstructured":"Zhijie Wu, Xiang Wang, Di Lin, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. 2019. Sagnet: Structure-aware generative network for 3d-shape modeling. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1\u201314.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_2_118_1","volume-title":"Structured 3D Latents for Scalable and Versatile 3D Generation. arXiv preprint arXiv:2412.01506","author":"Xiang Jianfeng","year":"2024","unstructured":"Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. 2024. Structured 3D Latents for Scalable and Versatile 3D Generation. arXiv preprint arXiv:2412.01506 (2024)."},{"key":"e_1_2_2_119_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00226"},{"key":"e_1_2_2_120_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00461"},{"key":"e_1_2_2_121_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00435"},{"key":"e_1_2_2_122_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.294"},{"key":"e_1_2_2_123_1","volume-title":"CADMLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM. arXiv preprint arXiv:2411.04954","author":"Xu Jingwei","year":"2024","unstructured":"Jingwei Xu, Chenyu Wang, Zibo Zhao, Wen Liu, Yi Ma, and Shenghua Gao. 2024. CADMLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM. arXiv preprint arXiv:2411.04954 (2024)."},{"key":"e_1_2_2_124_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72698-9_8"},{"key":"e_1_2_2_125_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00120"},{"key":"e_1_2_2_126_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02558"},{"key":"e_1_2_2_127_1","volume-title":"PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image. arXiv preprint arXiv:2411.18548","author":"Yan Han","year":"2024","unstructured":"Han Yan, Mingrui Zhang, Yang Li, Chao Ma, and Pan Ji. 2024a. PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image. arXiv preprint arXiv:2411.18548 (2024)."},{"key":"e_1_2_2_128_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-73254-6_8"},{"key":"e_1_2_2_129_1","volume-title":"Sampart3d: Segment any part in 3d objects. arXiv preprint arXiv:2411.07184","author":"Yang Yunhan","year":"2024","unstructured":"Yunhan Yang, Yukun Huang, Yuan-Chen Guo, Liangjun Lu, Xiaoyang Wu, Edmund Y Lam, Yan-Pei Cao, and Xihui Liu. 2024. Sampart3d: Segment any part in 3d objects. arXiv preprint arXiv:2411.07184 (2024)."},{"key":"e_1_2_2_130_1","volume-title":"Sam3d: Segment anything in 3d scenes. arXiv preprint arXiv:2306.03908","author":"Yang Yunhan","year":"2023","unstructured":"Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, and Xihui Liu. 2023. Sam3d: Segment anything in 3d scenes. arXiv preprint arXiv:2306.03908 (2023)."},{"key":"e_1_2_2_131_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00649"},{"key":"e_1_2_2_132_1","volume-title":"Shapegpt: 3d shape generation with a unified multi-modal language model. arXiv preprint arXiv:2311.17618","author":"Yin Fukun","year":"2023","unstructured":"Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan, Gang Yu, Taihao Li, and Tao Chen. 2023a. Shapegpt: 3d shape generation with a unified multi-modal language model. arXiv preprint arXiv:2311.17618 (2023)."},{"key":"e_1_2_2_133_1","volume-title":"4dgen: Grounded 4d content generation with spatial-temporal consistency. arXiv preprint arXiv:2312.17225","author":"Yin Yuyang","year":"2023","unstructured":"Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, and Yunchao Wei. 2023b. 4dgen: Grounded 4d content generation with spatial-temporal consistency. arXiv preprint arXiv:2312.17225 (2023)."},{"key":"e_1_2_2_134_1","volume-title":"Jiaqi Han, Rahul Thomas, Haotong Zhang, Suya You, and Leonidas Guibas.","author":"You Yang","year":"2024","unstructured":"Yang You, Mikaela Angelina Uy, Jiaqi Han, Rahul Thomas, Haotong Zhang, Suya You, and Leonidas Guibas. 2024. Img2cad: Reverse engineering 3d cad models from images through vlm-assisted conditional factorization. arXiv preprint arXiv:2408.01437 (2024)."},{"key":"e_1_2_2_135_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72764-1_10"},{"key":"e_1_2_2_136_1","doi-asserted-by":"publisher","DOI":"10.1145\/3618342"},{"key":"e_1_2_2_137_1","first-page":"36067","article-title":"Glipv2: Unifying localization and vision-language understanding","volume":"35","author":"Zhang Haotian","year":"2022","unstructured":"Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, and Jianfeng Gao. 2022b. Glipv2: Unifying localization and vision-language understanding. Advances in Neural Information Processing Systems 35 (2022), 36067\u201336080.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_138_1","doi-asserted-by":"publisher","DOI":"10.1145\/3658146"},{"key":"e_1_2_2_139_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00836"},{"key":"e_1_2_2_140_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02085"},{"key":"e_1_2_2_141_1","volume-title":"DNF: Unconditional 4D Generation with Dictionary-based Neural Fields. arXiv preprint arXiv:2412.05161","author":"Zhang Xinyi","year":"2024","unstructured":"Xinyi Zhang, Naiqi Li, and Angela Dai. 2024a. DNF: Unconditional 4D Generation with Dictionary-based Neural Fields. arXiv preprint arXiv:2412.05161 (2024)."},{"key":"e_1_2_2_142_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01595"},{"key":"e_1_2_2_143_1","volume-title":"Animate124: Animating one image to 4d dynamic scene. arXiv preprint arXiv:2311.14603","author":"Zhao Yuyang","year":"2023","unstructured":"Yuyang Zhao, Zhiwen Yan, Enze Xie, Lanqing Hong, Zhenguo Li, and Gim Hee Lee. 2023. Animate124: Animating one image to 4d dynamic scene. arXiv preprint arXiv:2311.14603 (2023)."},{"key":"e_1_2_2_144_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592103"},{"key":"e_1_2_2_145_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72980-5_11"},{"key":"e_1_2_2_146_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Zhou Junsheng","year":"2024","unstructured":"Junsheng Zhou, Jinsheng Wang, Baorui Ma, Yu-Shen Liu, Tiejun Huang, and Xinlong Wang. 2024. Uni3d: Exploring unified 3d representation at scale. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_2_147_1","volume-title":"The Thirteenth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=yXCTDhZDh6","author":"Zhou Yuchen","year":"2025","unstructured":"Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xiang, and Hao Su. 2025. Point-SAM: Promptable 3D Segmentation Model for Point Clouds. In The Thirteenth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=yXCTDhZDh6"},{"key":"e_1_2_2_148_1","volume-title":"Enhancing Low-Shot 3D Part Segmentation via Multi-View Instance Segmentation and Maximum Likelihood Estimation. arXiv preprint arXiv:2312.03015","author":"Zhou Yuchen","year":"2023","unstructured":"Yuchen Zhou, Jiayuan Gu, Xuanlin Li, Minghua Liu, Yunhao Fang, and Hao Su. 2023. PartSLIP++: Enhancing Low-Shot 3D Part Segmentation via Multi-View Instance Segmentation and Maximum Likelihood Estimation. arXiv preprint arXiv:2312.03015 (2023)."},{"key":"e_1_2_2_149_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00249"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3730840","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T17:59:03Z","timestamp":1774634343000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3730840"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,27]]},"references-count":149,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,8,1]]}},"alternative-id":["10.1145\/3730840"],"URL":"https:\/\/doi.org\/10.1145\/3730840","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,27]]},"assertion":[{"value":"2025-01-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-29","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}