{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:44:19Z","timestamp":1777657459082,"version":"3.51.4"},"publisher-location":"Cham","reference-count":111,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783031729515","type":"print"},{"value":"9783031729522","type":"electronic"}],"license":[{"start":{"date-parts":[[2024,10,1]],"date-time":"2024-10-01T00:00:00Z","timestamp":1727740800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2024,10,1]],"date-time":"2024-10-01T00:00:00Z","timestamp":1727740800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025]]},"DOI":"10.1007\/978-3-031-72952-2_4","type":"book-chapter","created":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T05:02:02Z","timestamp":1727672522000},"page":"53-72","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["TC4D: Trajectory-Conditioned Text-to-4D Generation"],"prefix":"10.1007","author":[{"given":"Sherwin","family":"Bahmani","sequence":"first","affiliation":[]},{"given":"Xian","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Wang","family":"Yifan","sequence":"additional","affiliation":[]},{"given":"Ivan","family":"Skorokhodov","sequence":"additional","affiliation":[]},{"given":"Victor","family":"Rong","sequence":"additional","affiliation":[]},{"given":"Ziwei","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Xihui","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Jeong Joon","family":"Park","sequence":"additional","affiliation":[]},{"given":"Sergey","family":"Tulyakov","sequence":"additional","affiliation":[]},{"given":"Gordon","family":"Wetzstein","sequence":"additional","affiliation":[]},{"given":"Andrea","family":"Tagliasacchi","sequence":"additional","affiliation":[]},{"given":"David B.","family":"Lindell","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,10,1]]},"reference":[{"key":"4_CR1","unstructured":"Zeroscope text-to-video model. https:\/\/huggingface.co\/cerspense\/zeroscope_v2_576w. Accessed 31 Oct 2023"},{"key":"4_CR2","unstructured":"Bahmani, S., et al.: 3D-aware video generation. TMLR (2023)"},{"key":"4_CR3","doi-asserted-by":"crossref","unstructured":"Bahmani, S., et al.: CC3D: layout-conditioned generation of compositional 3D scenes. In: Proceedings of ICCV (2023)","DOI":"10.1109\/ICCV51070.2023.00659"},{"key":"4_CR4","doi-asserted-by":"crossref","unstructured":"Bahmani, S., et al.: 4D-fy: text-to-4D generation using hybrid score distillation sampling. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00764"},{"key":"4_CR5","unstructured":"Bai, H., et al.: CompoNeRF: Text-guided multi-object compositional NeRF with editable 3D scene layout. arXiv preprint arXiv:2303.13843 (2023)"},{"key":"4_CR6","unstructured":"Bai, J., et al.: Uniedit: a unified tuning-free framework for video motion and appearance editing. arXiv preprint arXiv:2402.13185 (2024)"},{"key":"4_CR7","doi-asserted-by":"crossref","unstructured":"Bain, M., Nagrani, A., Varol, G., Zisserman, A.: Frozen in time: a joint video and image encoder for end-to-end retrieval. In: Proceedings of ICCV (2021)","DOI":"10.1109\/ICCV48922.2021.00175"},{"key":"4_CR8","unstructured":"Bie, F., et al.: RenAIssance: a survey into AI text-to-image generation in the era of large model. arXiv preprint arXiv:2309.00810 (2023)"},{"key":"4_CR9","unstructured":"Blattmann, A., et al.: Stable video diffusion: scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)"},{"key":"4_CR10","doi-asserted-by":"crossref","unstructured":"Blattmann, A., et al.: Align your latents: high-resolution video synthesis with latent diffusion models. In: Proceedings of CVPR (2023)","DOI":"10.1109\/CVPR52729.2023.02161"},{"key":"4_CR11","unstructured":"Brooks, T., et al.: Video generation models as world simulators (2024). https:\/\/openai.com\/research\/video-generation-models-as-world-simulators"},{"key":"4_CR12","doi-asserted-by":"crossref","unstructured":"Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: Proceedings of CVPR (2022)","DOI":"10.1109\/CVPR52688.2022.01565"},{"key":"4_CR13","doi-asserted-by":"crossref","unstructured":"Chan, E.R., et al.: Generative novel view synthesis with 3D-aware diffusion models. In: Proceedings of ICCV (2023)","DOI":"10.1109\/ICCV51070.2023.00389"},{"key":"4_CR14","doi-asserted-by":"crossref","unstructured":"Chen, H., et al.: VideoCrafter2: overcoming data limitations for high-quality video diffusion models. arXiv preprint arXiv:2401.09047 (2024)","DOI":"10.1109\/CVPR52733.2024.00698"},{"key":"4_CR15","doi-asserted-by":"crossref","unstructured":"Chen, K., Choy, C.B., Savva, M., Chang, A.X., Funkhouser, T., Savarese, S.: Text2Shape: generating shapes from natural language by learning joint embeddings. In: Proceedings of ACCV (2018)","DOI":"10.1007\/978-3-030-20893-6_7"},{"key":"4_CR16","doi-asserted-by":"crossref","unstructured":"Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. arXiv preprint arXiv:2303.13873 (2023)","DOI":"10.1109\/ICCV51070.2023.02033"},{"key":"4_CR17","doi-asserted-by":"crossref","unstructured":"Chen, Y., Wang, T., Wu, T., Pan, X., Jia, K., Liu, Z.: ComboVerse: compositional 3D assets creation using spatially-aware diffusion guidance. arXiv preprint arXiv:2403.12409 (2024)","DOI":"10.1007\/978-3-031-72691-0_8"},{"key":"4_CR18","doi-asserted-by":"crossref","unstructured":"Cohen-Bar, D., Richardson, E., Metzer, G., Giryes, R., Cohen-Or, D.: Set-the-scene: global-local training for generating controllable NeRF scenes. In: Proceedings of ICCV Workshops (2023)","DOI":"10.1109\/ICCVW60793.2023.00314"},{"key":"4_CR19","doi-asserted-by":"crossref","unstructured":"DeVries, T., Bautista, M.A., Srivastava, N., Taylor, G.W., Susskind, J.M.: Unconstrained scene generation with locally conditioned radiance fields. In: Proceedings of ICCV (2021)","DOI":"10.1109\/ICCV48922.2021.01404"},{"key":"4_CR20","unstructured":"Epstein, D., Poole, B., Mildenhall, B., Efros, A.A., Holynski, A.: Disentangled 3D scene generation with layout learning. In: Proceedings of ICML (2024)"},{"key":"4_CR21","unstructured":"Feng, Q., Xing, Z., Wu, Z., Jiang, Y.G.: FDGaussian: fast Gaussian splatting from single image via geometric-aware diffusion model. arXiv preprint arXiv:2403.10242 (2024)"},{"key":"4_CR22","doi-asserted-by":"crossref","unstructured":"Gao, G., Liu, W., Chen, A., Geiger, A., Sch\u00f6lkopf, B.: GraphDreamer: compositional 3D scene synthesis from scene graphs. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.02012"},{"key":"4_CR23","unstructured":"Gao, Q., et al.: GaussianFlow: Splatting Gaussian dynamics for 4D content creation. arXiv preprint arXiv:2403.12365 (2024)"},{"key":"4_CR24","doi-asserted-by":"crossref","unstructured":"Gao, W., Aigerman, N., Groueix, T., Kim, V., Hanocka, R.: TextDeformer: geometry manipulation using text guidance. In: Proceedings of SIGGRAPH (2023)","DOI":"10.1145\/3588432.3591552"},{"key":"4_CR25","unstructured":"Gu, J., et al.: NerfDiff: single-image view synthesis with NeRF-guided distillation from 3D-aware diffusion. In: Proceedings of ICML (2023)"},{"key":"4_CR26","unstructured":"Guo, Y., et al.: AnimateDiff: animate your personalized text-to-image diffusion models without specific tuning. In: Proceedings of ICLR (2024)"},{"key":"4_CR27","doi-asserted-by":"crossref","unstructured":"Han, J., Kokkinos, F., Torr, P.: VFusion3D: learning scalable 3D generative models from video diffusion models. arXiv preprint arXiv:2403.12034 (2024)","DOI":"10.1007\/978-3-031-72627-9_19"},{"key":"4_CR28","doi-asserted-by":"crossref","unstructured":"He, X., et al.: GVGEN: text-to-3D generation with volumetric representation. arXiv preprint arXiv:2403.12957 (2024)","DOI":"10.1007\/978-3-031-73242-3_26"},{"key":"4_CR29","unstructured":"He, Y., Yang, T., Zhang, Y., Shan, Y., Chen, Q.: Latent video diffusion models for high-fidelity video generation with arbitrary lengths. arXiv preprint arXiv:2211.13221 (2022)"},{"key":"4_CR30","unstructured":"Ho, J., et\u00a0al.: Imagen video: high definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)"},{"key":"4_CR31","doi-asserted-by":"crossref","unstructured":"H\u00f6llein, L., et al.: ViewDiff: 3D-consistent image generation with text-to-image models. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00482"},{"key":"4_CR32","unstructured":"Hong, Y., et al.: LRM: large reconstruction model for single image to 3D. In: Proceedings of ICLR (2024)"},{"key":"4_CR33","doi-asserted-by":"crossref","unstructured":"Jain, A., Mildenhall, B., Barron, J.T., Abbeel, P., Poole, B.: Zero-shot text-guided object generation with dream fields. In: Proceedings of CVPR (2022)","DOI":"10.1109\/CVPR52688.2022.00094"},{"key":"4_CR34","unstructured":"Jetchev, N.: ClipMatrix: text-controlled creation of 3D textured meshes. arXiv preprint arXiv:2109.12922 (2021)"},{"key":"4_CR35","unstructured":"Jiang, L., Wang, L.: Brightdreamer: generic 3D Gaussian generative framework for fast text-to-3D synthesis. arXiv preprint arXiv:2403.11273 (2024)"},{"key":"4_CR36","unstructured":"Jiang, Y., Zhang, L., Gao, J., Hu, W., Yao, Y.: Consistent4D: consistent 360$$^{\\circ }$$ dynamic object generation from monocular video. arXiv preprint arXiv:2311.02848 (2023)"},{"key":"4_CR37","unstructured":"Katzir, O., Patashnik, O., Cohen-Or, D., Lischinski, D.: Noise-free score distillation. In: Proceedings of ICLR (2024)"},{"key":"4_CR38","doi-asserted-by":"crossref","unstructured":"Kim, S.W., et al.: NeuralField-LDM: scene generation with hierarchical latent diffusion models. In: Proceedings of CVPR (2023)","DOI":"10.1109\/CVPR52729.2023.00821"},{"key":"4_CR39","unstructured":"Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)"},{"key":"4_CR40","unstructured":"Lee, K., Sohn, K., Shin, J.: DreamFlow: high-quality text-to-3D generation by approximating probability flow. In: Proceedings of ICLR (2024)"},{"key":"4_CR41","unstructured":"Li, J., et al.: Instant3D: fast text-to-3D with sparse-view generation and large reconstruction model. In: Proceedings of ICLR (2024)"},{"key":"4_CR42","unstructured":"Li, R., Tancik, M., Kanazawa, A.: NerfAcc: a general NeRF acceleration toolbox. In: Proceedings of ICCV (2023)"},{"key":"4_CR43","unstructured":"Li, Z., Chen, Y., Zhao, L., Liu, P.: Controllable text-to-3D generation via surface-aligned Gaussian splatting. arXiv preprint arXiv:2403.09981 (2024)"},{"key":"4_CR44","doi-asserted-by":"crossref","unstructured":"Liang, Y., Yang, X., Lin, J., Li, H., Xu, X., Chen, Y.: Luciddreamer: towards high-fidelity text-to-3D generation via interval score matching. arXiv preprint arXiv:2311.11284 (2023)","DOI":"10.1109\/CVPR52733.2024.00623"},{"key":"4_CR45","doi-asserted-by":"crossref","unstructured":"Lin, C.H., et al.: Magic3D: high-resolution text-to-3D content creation. In: Proceedings of CVPR (2023)","DOI":"10.1109\/CVPR52729.2023.00037"},{"key":"4_CR46","doi-asserted-by":"crossref","unstructured":"Lin, Y., Han, H., Gong, C., Xu, Z., Zhang, Y., Li, X.: Consistent123: one image to highly consistent 3D asset using case-aware diffusion priors. arXiv preprint arXiv:2309.17261 (2023)","DOI":"10.1145\/3664647.3680994"},{"key":"4_CR47","doi-asserted-by":"crossref","unstructured":"Ling, H., Kim, S.W., Torralba, A., Fidler, S., Kreis, K.: Align your Gaussians: text-to-4D with dynamic 3D Gaussians and composed diffusion models. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00819"},{"key":"4_CR48","unstructured":"Liu, P., et al.: Isotropic3D: Image-to-3D generation based on a single clip embedding. arXiv preprint arXiv:2403.10395 (2024)"},{"key":"4_CR49","doi-asserted-by":"crossref","unstructured":"Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: zero-shot one image to 3D object. In: Proceedings of ICCV (2023)","DOI":"10.1109\/ICCV51070.2023.00853"},{"key":"4_CR50","doi-asserted-by":"crossref","unstructured":"Liu, X., et al.: HumanGaussian: text-driven 3D human generation with Gaussian splatting. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00635"},{"key":"4_CR51","unstructured":"Liu, Y., et al.: SyncDreamer: generating multiview-consistent images from a single-view image. In: Proceedings of ICLR (2024)"},{"key":"4_CR52","doi-asserted-by":"crossref","unstructured":"Long, X., et al.: Wonder3D: single image to 3D using cross-domain diffusion. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00951"},{"key":"4_CR53","unstructured":"Ma, X., et al.: Latte: latent diffusion transformer for video generation. arXiv preprint arXiv:2401.03048 (2024)"},{"issue":"4","key":"4_CR54","doi-asserted-by":"publisher","first-page":"3974","DOI":"10.1007\/s10489-022-03766-z","volume":"53","author":"M Masood","year":"2023","unstructured":"Masood, M., Nawaz, M., Malik, K.M., Javed, A., Irtaza, A., Malik, H.: Deepfakes generation and detection: state-of-the-art, open challenges, countermeasures, and way forward. Appl. Intell. 53(4), 3974\u20134026 (2023)","journal-title":"Appl. Intell."},{"key":"4_CR55","doi-asserted-by":"crossref","unstructured":"Menapace, W., et al.: Snap video: scaled spatiotemporal transformers for text-to-video synthesis. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00672"},{"key":"4_CR56","doi-asserted-by":"crossref","unstructured":"Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Proceedings of ECCV (2020)","DOI":"10.1007\/978-3-030-58452-8_24"},{"issue":"4","key":"4_CR57","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3528223.3530127","volume":"41","author":"T M\u00fcller","year":"2022","unstructured":"M\u00fcller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 1\u201315 (2022)","journal-title":"ACM Trans. Graph."},{"key":"4_CR58","doi-asserted-by":"crossref","unstructured":"Or-El, R., Luo, X., Shan, M., Shechtman, E., Park, J.J., Kemelmacher-Shlizerman, I.: StyleSDF: high-resolution 3D-consistent image and geometry generation. In: Proceedings of CVPR (2022)","DOI":"10.1109\/CVPR52688.2022.01314"},{"key":"4_CR59","unstructured":"Pan, Z., Yang, Z., Zhu, X., Zhang, L.: Fast dynamic 3D object generation from a single-view video. arXiv preprint arXiv:2401.08742 (2024)"},{"key":"4_CR60","doi-asserted-by":"crossref","unstructured":"Po, R., Wetzstein, G.: Compositional 3D scene generation using locally conditioned diffusion. In: Proceedings of 3DV (2024)","DOI":"10.1109\/3DV62453.2024.00026"},{"key":"4_CR61","unstructured":"Po, R., et\u00a0al.: State of the art on diffusion models for visual computing. arXiv preprint arXiv:2310.07204 (2023)"},{"key":"4_CR62","unstructured":"Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. In: Proceedings of ICLR (2023)"},{"key":"4_CR63","unstructured":"Qian, G., et\u00a0al.: Atom: amortized text-to-mesh using 2D diffusion. arXiv preprint arXiv:2402.00867 (2024)"},{"key":"4_CR64","unstructured":"Qian, G., et al.: Magic123: one image to high-quality 3D object generation using both 2D and 3D diffusion priors. In: Proceedings of ICLR (2024)"},{"key":"4_CR65","unstructured":"Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML (2021)"},{"key":"4_CR66","unstructured":"Ren, J., et al.: DreamGaussian4D: generative 4D Gaussian splatting. arXiv preprint arXiv:2312.17142 (2023)"},{"key":"4_CR67","doi-asserted-by":"crossref","unstructured":"Sanghi, A., et al.: CLIP-forge: towards zero-shot text-to-shape generation. In: Proceedings of CVPR (2022)","DOI":"10.1109\/CVPR52688.2022.01805"},{"key":"4_CR68","unstructured":"Schwarz, K., Sauer, A., Niemeyer, M., Liao, Y., Geiger, A.: VoxGRAF: fast 3D-aware image synthesis with sparse voxel grids. In: Proceedings of NeurIPS (2022)"},{"key":"4_CR69","doi-asserted-by":"crossref","unstructured":"Shi, X., et al.: Motion-I2V: consistent and controllable image-to-video generation with explicit motion modeling. arXiv preprint arXiv:2401.15977 (2024)","DOI":"10.1145\/3641519.3657497"},{"key":"4_CR70","unstructured":"Shi, Y., Wang, P., Ye, J., Mai, L., Li, K., Yang, X.: MVDream: multi-view diffusion for 3D generation. In: Proceedings of ICLR (2024)"},{"key":"4_CR71","unstructured":"Singer, U., et al.: Make-a-video: text-to-video generation without text-video data. In: Proceedings of ICLR (2023)"},{"key":"4_CR72","unstructured":"Singer, U., et al.: Text-to-4D dynamic scene generation. In: Proceedings of ICML (2023)"},{"key":"4_CR73","doi-asserted-by":"crossref","unstructured":"Skorokhodov, I., Tulyakov, S., Elhoseiny, M.: StyleGAN-V: a continuous video generator with the price, image quality and perks of StyleGAN2. In: Proceedings of CVPR (2022)","DOI":"10.1109\/CVPR52688.2022.00361"},{"key":"4_CR74","unstructured":"Sun, J., Zhang, B., Shao, R., Wang, L., Liu, W., Xie, Z., Liu, Y.: DreamCraft3D: hierarchical 3D generation with bootstrapped diffusion prior. In: Proceedings of ICLR (2024)"},{"key":"4_CR75","doi-asserted-by":"crossref","unstructured":"Szymanowicz, S., Rupprecht, C., Vedaldi, A.: Splatter image: ultra-fast single-view 3D reconstruction. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00972"},{"key":"4_CR76","doi-asserted-by":"crossref","unstructured":"Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., Liu, Z.: LGM: large multi-view gaussian model for high-resolution 3D content creation. In: Proceedings of ECCV (2024)","DOI":"10.1007\/978-3-031-73235-5_1"},{"key":"4_CR77","doi-asserted-by":"crossref","unstructured":"Tang, J., et al.: Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184 (2023)","DOI":"10.1109\/ICCV51070.2023.02086"},{"key":"4_CR78","unstructured":"Tewari, A., et al.: Diffusion with forward models: solving stochastic inverse problems without direct supervision. In: Proceedings of NeurIPS (2023)"},{"key":"4_CR79","unstructured":"Tochilkin, D., et al.: Triposr: fast 3D object reconstruction from a single image. arXiv preprint arXiv:2403.02151 (2024)"},{"key":"4_CR80","unstructured":"Vaswani, A., et al.: Attention is all you need. In: Proceedings of NeurIPS (2017)"},{"key":"4_CR81","unstructured":"Vilesov, A., Chari, P., Kadambi, A.: CG3D: compositional generation for text-to-3D via Gaussian splatting. arXiv preprint arXiv:2311.17907 (2023)"},{"key":"4_CR82","doi-asserted-by":"crossref","unstructured":"Voleti, V., et al.: SV3D: novel multi-view synthesis and 3D generation from a single image using latent video diffusion. arXiv preprint arXiv:2403.12008 (2024)","DOI":"10.1007\/978-3-031-73232-4_25"},{"key":"4_CR83","doi-asserted-by":"crossref","unstructured":"Wan, Z., et al.: CAD: photorealistic 3D generation via adversarial distillation. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00971"},{"key":"4_CR84","doi-asserted-by":"crossref","unstructured":"Wang, C., Chai, M., He, M., Chen, D., Liao, J.: Clip-NeRF: text-and-image driven manipulation of neural radiance fields. In: Proceedings of CVPR (2022)","DOI":"10.1109\/CVPR52688.2022.00381"},{"key":"4_CR85","doi-asserted-by":"crossref","unstructured":"Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score Jacobian chaining: lifting pretrained 2D diffusion models for 3D generation. In: Proceedings of CVPR (2023)","DOI":"10.1109\/CVPR52729.2023.01214"},{"key":"4_CR86","unstructured":"Wang, J., et al.: Boximator: generating rich and controllable motions for video synthesis. arXiv preprint arXiv:2402.01566 (2024)"},{"key":"4_CR87","unstructured":"Wang, J., Yuan, H., Chen, D., Zhang, Y., Wang, X., Zhang, S.: Modelscope text-to-video technical report. arXiv preprint arXiv:2308.06571 (2023)"},{"key":"4_CR88","unstructured":"Wang, W., et al.: Videofactory: swap attention in spatiotemporal diffusions for text-to-video generation. arXiv preprint arXiv:2305.10874 (2023)"},{"key":"4_CR89","unstructured":"Wang, X., et al.: Videocomposer: compositional video synthesis with motion controllability. arXiv preprint arXiv:2306.02018 (2023)"},{"key":"4_CR90","unstructured":"Wang, Z., et al.: ProlificDreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. In: Proceedings of NeurIPS (2023)"},{"key":"4_CR91","doi-asserted-by":"crossref","unstructured":"Wang, Z., et al.: MotionCtrl: a unified and flexible motion controller for video generation. arXiv preprint arXiv:2312.03641 (2023)","DOI":"10.1145\/3641519.3657518"},{"key":"4_CR92","doi-asserted-by":"crossref","unstructured":"Wu, R., Chen, L., Yang, T., Guo, C., Li, C., Zhang, X.: LAMP: learn a motion pattern for few-shot-based video generation. arXiv preprint arXiv:2310.10769 (2023)","DOI":"10.1109\/CVPR52733.2024.00677"},{"key":"4_CR93","doi-asserted-by":"crossref","unstructured":"Wu, T., et al.: GPT-4V(ision) is a human-aligned evaluator for text-to-3D generation. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.02098"},{"key":"4_CR94","doi-asserted-by":"crossref","unstructured":"Xie, K., et al.: LATTE3D: large-scale amortized text-to-enhanced3D synthesis. In: Proceedings of ECCV (2024)","DOI":"10.1007\/978-3-031-72980-5_18"},{"key":"4_CR95","doi-asserted-by":"crossref","unstructured":"Xu, Y., et al.: GRM: large Gaussian reconstruction model for efficient 3D reconstruction and generation. In: Proceedings of ECCV (2024)","DOI":"10.1007\/978-3-031-72633-0_1"},{"key":"4_CR96","unstructured":"Xu, Y., et al.: DMV3D: denoising multi-view diffusion using 3D large reconstruction model. In: Proceedings of ICLR (2024)"},{"key":"4_CR97","doi-asserted-by":"crossref","unstructured":"Xue, H., et al.: Advancing high-resolution video-language representation with large-scale video transcriptions. In: Proceedings of CVPR (2022)","DOI":"10.1109\/CVPR52688.2022.00498"},{"key":"4_CR98","unstructured":"Yang, Q., et al.: Beyond skeletons: integrative latent mapping for coherent 4D sequence generation. arXiv preprint arXiv:2403.13238 (2024)"},{"key":"4_CR99","doi-asserted-by":"crossref","unstructured":"Yang, S., et al.: Direct-a-video: customized video generation with user-directed camera movement and object motion. arXiv preprint arXiv:2402.03162 (2024)","DOI":"10.1145\/3641519.3657481"},{"key":"4_CR100","unstructured":"Ye, J., et al.: DreamReward: text-to-3D generation with human preference. arXiv preprint arXiv:2403.14613 (2024)"},{"key":"4_CR101","unstructured":"Yin, Y., Xu, D., Wang, Z., Zhao, Y., Wei, Y.: 4DGen: grounded 4D content generation with spatial-temporal consistency. arXiv preprint arXiv:2312.17225 (2023)"},{"key":"4_CR102","unstructured":"Yoo, P., Guo, J., Matsuo, Y., Gu, S.S.: DreamSparse: escaping from Plato\u2019s cave with 2D diffusion model given sparse views. arXiv preprint arXiv:2306.03414 (2023)"},{"key":"4_CR103","unstructured":"Yu, X., Guo, Y.C., Li, Y., Liang, D., Zhang, S.H., Qi, X.: Text-to-3D with classifier score distillation. arXiv preprint arXiv:2310.19415 (2023)"},{"key":"4_CR104","doi-asserted-by":"crossref","unstructured":"Yunus, R., et al.: Recent trends in 3D reconstruction of general non-rigid scenes. In: Computer Graphics Forum (2024)","DOI":"10.1111\/cgf.15062"},{"key":"4_CR105","doi-asserted-by":"crossref","unstructured":"Zeng, Y., et al.: STAG4D: spatial-temporal anchored generative 4D Gaussians. arXiv preprint arXiv:2403.14939 (2024)","DOI":"10.1007\/978-3-031-72764-1_10"},{"key":"4_CR106","doi-asserted-by":"crossref","unstructured":"Zhang, B., Yang, T., Li, Y., Zhang, L., Zhao, X.: Compress3D: a compressed latent space for 3D generation from a single image. arXiv preprint arXiv:2403.13524 (2024)","DOI":"10.1007\/978-3-031-72649-1_16"},{"key":"4_CR107","doi-asserted-by":"crossref","unstructured":"Zhang, Q., et al.: SceneWiz3D: towards text-guided 3D scene composition. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00652"},{"key":"4_CR108","unstructured":"Zhao, Y., Yan, Z., Xie, E., Hong, L., Li, Z., Lee, G.H.: Animate124: animating one image to 4D dynamic scene. arXiv preprint arXiv:2311.14603 (2023)"},{"key":"4_CR109","doi-asserted-by":"crossref","unstructured":"Zheng, Y., Li, X., Nagano, K., Liu, S., Hilliges, O., De Mello, S.: A unified approach for text-and image-guided 4D scene generation. In: Proceedings of CVPR (2024)","DOI":"10.1109\/CVPR52733.2024.00697"},{"key":"4_CR110","unstructured":"Zhou, D., Wang, W., Yan, H., Lv, W., Zhu, Y., Feng, J.: Magicvideo: efficient video generation with latent diffusion models. arXiv preprint arXiv:2211.11018 (2022)"},{"key":"4_CR111","unstructured":"Zhou, X., et al.: GALA3D: towards text-to-3D complex scene generation via layout-guided generative Gaussian splatting. arXiv preprint arXiv:2402.07207 (2024)"}],"container-title":["Lecture Notes in Computer Science","Computer Vision \u2013 ECCV 2024"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-72952-2_4","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,28]],"date-time":"2024-11-28T21:41:08Z","timestamp":1732830068000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-72952-2_4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,1]]},"ISBN":["9783031729515","9783031729522"],"references-count":111,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-72952-2_4","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,1]]},"assertion":[{"value":"1 October 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"Our approach can automatically generate realistic, animated 3D scenes; such techniques can be misused to promulgate realistic fake content for the purposes of misinformation. We condemn misuse of generative models in this fashion and emphasize the importance of research to thwart such efforts (see, e.g., Masood et al.\u00a0[] for an extended discussion of this topic).","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics Statement"}},{"value":"ECCV","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"European Conference on Computer Vision","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Milan","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Italy","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2024","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"29 September 2024","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"4 October 2024","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"18","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"eccv2024","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/eccv2024.ecva.net\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}