{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T05:07:14Z","timestamp":1776488834888,"version":"3.51.2"},"reference-count":61,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T00:00:00Z","timestamp":1759190400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T00:00:00Z","timestamp":1759190400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62372016"],"award-info":[{"award-number":["62372016"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Vis. Intell."],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Single image-to-3D generation is pivotal for crafting controllable 3D assets. Although recent inference-only methods have achieved impressive effects, their generation quality still lags behind that of image generation models. We attempt to leverage 3D geometric priors from the novel view diffusion model and 2D appearance priors from an image generation model to combine the geometric messages of the former and appearance priors of the latter. We note that there is a disparity between the generation priors of these two diffusion models, leading to different appearance outputs. Specifically, image generation models tend to deliver more detailed visuals, whereas novel view models produce consistent yet over-smooth results across different views. Directly combining them leads to suboptimal effects due to their appearance conflicts. Hence, we propose a 2D-3D hybrid Fourier score distillation objective function, called hy-FSD. It optimizes 3D Gaussians using 3D priors in the spatial domain to ensure geometric consistency, while exploiting 2D priors in the frequency domain through the Fourier transform for better visual quality. The proposed hy-FSD can be integrated into existing 3D generation methods and produce significant performance gains. With this technique, we have developed an image-to-3D generation pipeline to create high-quality 3D objects within one minute, named Fourier123. Extensive experiments demonstrate that Fourier123 excels at efficiently generating results with rapid convergence speed and a visually appealing output.<\/jats:p>","DOI":"10.1007\/s44267-025-00089-8","type":"journal-article","created":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T07:48:41Z","timestamp":1759218521000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Hybrid Fourier score distillation for efficient one image to 3D object generation"],"prefix":"10.1007","volume":"3","author":[{"given":"Shuzhou","family":"Yang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haijie","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiarui","family":"Meng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanmin","family":"Wu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiandong","family":"Meng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5486-3125","authenticated-orcid":false,"given":"Jian","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,9,30]]},"reference":[{"key":"89_CR1","first-page":"704","volume-title":"Proceedings of the 15th European conference on computer vision","author":"A. Ranjan","year":"2018","unstructured":"Ranjan, A., Bolkart, T., Sanyal, S., & Black, M. J. (2018). Generating 3D faces using convolutional mesh autoencoders. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Proceedings of the 15th European conference on computer vision (pp. 704\u2013720). Charm: Springer."},{"key":"89_CR2","first-page":"55","volume-title":"Proceedings of the 15th European conference on computer vision","author":"N. Wang","year":"2018","unstructured":"Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., & Jiang, Y.-G. (2018). Pixel2mesh: generating 3D mesh models from single RGB images. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Proceedings of the 15th European conference on computer vision (pp. 55\u201371). Charm: Springer."},{"issue":"4","key":"89_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3386569.3392415","volume":"39","author":"R. Hanocka","year":"2020","unstructured":"Hanocka, R., Metzer, G., Giryes, R., & Cohen-Or, D. (2020). Point2mesh: a self-prior for deformable meshes. ACM Transactions on Graphics, 39(4), 1\u201312.","journal-title":"ACM Transactions on Graphics"},{"key":"89_CR4","first-page":"2672","volume-title":"Proceedings of the 28th international conference on neural information processing systems","author":"I. Goodfellow","year":"2014","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Z. Ghahramani, M. Welling, C.\u00a0Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K.\u00a0Q. Weinberger (Eds.), Proceedings of the 28th international conference on neural information processing systems (pp. 2672\u20132680). Red Hook: Curran Associates."},{"key":"89_CR5","first-page":"1","volume-title":"Proceedings of the 34th international conference on neural information processing systems","author":"J. Ho","year":"2020","unstructured":"Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Proceedings of the 34th international conference on neural information processing systems (pp. 1\u201312). Red Hook: Curran Associates."},{"key":"89_CR6","unstructured":"Yang, S., Li, X., Cun, X., Wang, G., Li, L., Shan, Y., & Zhang, J. (2025). Gencompositor: generative video compositing with diffusion transformer. arXiv preprint. arXiv:2509.02460."},{"key":"89_CR7","first-page":"1","volume-title":"Proceedings of the 12th international conference on learning representations","author":"Y. Hong","year":"2024","unstructured":"Hong, Y., Zhang, K., Gu, J., Bi, S., Zhou, Y., Liu, D., Liu, F., Sunkavalli, K., Bui, T., & Tan, H. (2024). LRM: large reconstruction model for single image to 3D. In Proceedings of the 12th international conference on learning representations (pp. 1\u201325). Retrieved September 14, 2025, from https:\/\/openreview.net\/forum?id=sllU8vvsFF."},{"key":"89_CR8","first-page":"1","volume-title":"Proceedings of the 37th international conference on neural information processing systems","author":"Z. Wang","year":"2023","unstructured":"Wang, Z., Lu, C., Wang, Y., Bao, F., Li, C., Su, H., & Zhu, J. (2023). Prolificdreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. In A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine (Eds.), Proceedings of the 37th international conference on neural information processing systems (pp. 1\u201336). Red Hook: Curran Associates."},{"key":"89_CR9","first-page":"9264","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"R. Liu","year":"2023","unstructured":"Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., & Vondrick, C. (2023). Zero-1-to-3: zero-shot one image to 3D object. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 9264\u20139275). Piscataway: IEEE."},{"key":"89_CR10","unstructured":"Wang, P., & Shi, Y. (2023). Imagedream: image-prompt multi-view diffusion for 3D generation. arXiv preprint. arXiv:2312.02201."},{"key":"89_CR11","first-page":"1","volume-title":"Proceedings of the 12th international conference on learning representations","author":"Y. Shi","year":"2024","unstructured":"Shi, Y., Wang, P., Ye, J., Mai, L., Li, K., & Yang, X. (2024). MVDream: multi-view diffusion for 3D generation. In Proceedings of the 12th international conference on learning representations (pp. 1\u201321). Retrieved September 14, 2025, from https:\/\/openreview.net\/forum?id=FUgrjq2pbB."},{"key":"89_CR12","unstructured":"Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., & Chen, M. (2022). Point-E: a system for generating 3D point clouds from complex prompts. arXiv preprint. arXiv:2212.08751."},{"key":"89_CR13","unstructured":"Yang, S., Cun, X., Li, X., Li, Y., & Zhang, J. (2025). 4DVD: cascaded dense-view video diffusion model for high-quality 4D content generation. arXiv preprint. arXiv:2508.04467."},{"key":"89_CR14","first-page":"10674","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"R. Rombach","year":"2022","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 10674\u201310685). Piscataway: IEEE."},{"key":"89_CR15","first-page":"300","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"C.-H. Lin","year":"2023","unstructured":"Lin, C.-H., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler, S., Liu, M.-Y., & Lin, T.-Y. (2023). Magic3D: high-resolution text-to-3D content creation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 300\u2013309). Piscataway: IEEE."},{"key":"89_CR16","first-page":"1","volume-title":"Proceedings of the 11th international conference on learning representations","author":"B. Poole","year":"2023","unstructured":"Poole, B., Jain, A., Barron, J. T., & Mildenhall, B. (2023). Dreamfusion: text-to-3D using 2D diffusion. In Proceedings of the 11th international conference on learning representations (pp. 1\u201318). Retrieved September 14, 2025, from https:\/\/openreview.net\/forum?id=FjNys5c7VyY."},{"key":"89_CR17","first-page":"1","volume-title":"NeurIPS 2021 workshop on deep generative models and downstream applications","author":"J. Ho","year":"2021","unstructured":"Ho, J., & Salimans, T. (2021). Classifier-free diffusion guidance. In NeurIPS 2021 workshop on deep generative models and downstream applications (pp. 1\u201314). Red Hook: Curran Associates."},{"key":"89_CR18","first-page":"1","volume-title":"Proceedings of the 12th international conference on learning representations","author":"J. Tang","year":"2024","unstructured":"Tang, J., Ren, J., Zhou, H., Liu, Z., & Zeng, G. (2024). DreamGaussian: generative Gaussian splatting for efficient 3D content creation. In Proceedings of the 12th international conference on learning representations (pp. 1\u201318). Retrieved September 14, 2025, from https:\/\/openreview.net\/forum?id=UyNXMqnN3c."},{"key":"89_CR19","first-page":"1","volume-title":"Proceedings of the 12th international conference on learning representations","author":"G. Qian","year":"2024","unstructured":"Qian, G., Mai, J., Hamdi, A., Ren, J., Siarohin, A., Li, B., Lee, H.-Y., Skorokhodov, I., Wonka, P., Tulyakov, S., & Ghanem, B. (2024). Magic123: one image to high-quality 3D object generation using both 2D and 3D diffusion priors. In Proceedings of the 12th international conference on learning representations (pp. 1\u201318). Retrieved September 14, 2025, from https:\/\/openreview.net\/forum?id=0jHkUDyEO9."},{"key":"89_CR20","first-page":"27171","volume-title":"Proceedings of the 35th international conference on neural information processing systems","author":"P. Wang","year":"2021","unstructured":"Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., & Wang, W. (2021). Neus: learning neural implicit surfaces by volume rendering for multi-view reconstruction. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. W. Vaughan (Eds.), Proceedings of the 35th international conference on neural information processing systems (pp. 27171\u201327183). Red Hook: Curran Associates."},{"issue":"1","key":"89_CR21","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1145\/3503250","volume":"65","author":"B. Mildenhall","year":"2021","unstructured":"Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2021). NeRF: representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1), 99\u2013106.","journal-title":"Communications of the ACM"},{"key":"89_CR22","first-page":"5835","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"J. T. Barron","year":"2021","unstructured":"Barron, J. T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., & Srinivasan, P. P. (2021). Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 5835\u20135844). Piscataway: IEEE."},{"key":"89_CR23","first-page":"5845","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"K. Park","year":"2021","unstructured":"Park, K., Sinha, U., Barron, J. T., Bouaziz, S., Goldman, D. B., Seitz, S. M., & Martin-Brualla, R. (2021). Nerfies: deformable neural radiance fields. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 5845\u20135854). Piscataway: IEEE."},{"key":"89_CR24","unstructured":"Wang, Y., Yang, S., Hu, Y., & Zhang, J. (2022). NeRFocus: neural radiance field for 3D synthetic defocus. arXiv preprint. arXiv:2203.05189."},{"key":"89_CR25","first-page":"8456","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"Z. Li","year":"2023","unstructured":"Li, Z., M\u00fcller, T., Evans, A., Taylor, R. H., Unberath, M., Liu, M.-Y., & Lin, C.-H. (2023). Neuralangelo: high-fidelity neural surface reconstruction. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 8456\u20138465). Piscataway: IEEE."},{"key":"89_CR26","first-page":"1435","volume-title":"Proceedings of the AAAI conference on artificial intelligence","author":"Z. Cui","year":"2024","unstructured":"Cui, Z., Gu, L., Sun, X., Ma, X., Qiao, Y., & Harada, T. (2024). Aleth-NeRF: illumination adaptive nerf with concealing field assumption. In Proceedings of the AAAI conference on artificial intelligence (pp. 1435\u20131444). Palo Alto: AAAI Press."},{"key":"89_CR27","first-page":"12872","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"S. Yang","year":"2023","unstructured":"Yang, S., Ding, M., Wu, Y., Li, Z., & Zhang, J. (2023). Implicit neural representation for cooperative low-light image enhancement. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 12872\u201312881). Piscataway: IEEE."},{"key":"89_CR28","first-page":"1435","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"A. Jain","year":"2022","unstructured":"Jain, A., Mildenhall, B., Barron, J. T., Abbeel, P., & Poole, B. (2022). Zero-shot text-guided object generation with dream fields. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 1435\u20131444). Piscataway: IEEE."},{"key":"89_CR29","first-page":"1","volume-title":"Proceedings of the 12th international conference on learning representations","author":"K. Lee","year":"2024","unstructured":"Lee, K., Sohn, K., & Shin, J. (2024). Dreamflow: high-quality text-to-3D generation by approximating probability flow. In Proceedings of the 12th international conference on learning representations (pp. 1\u201326). Retrieved September 14, 2025, from https:\/\/openreview.net\/forum?id=GURqUuTebY."},{"key":"89_CR30","first-page":"5732","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"A. Yu","year":"2021","unstructured":"Yu, A., Li, R., Tancik, M., Li, H., Ng, R., & Kanazawa, A. (2021). Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 5732\u20135741). Piscataway: IEEE."},{"key":"89_CR31","first-page":"5491","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"S. Fridovich-Keil","year":"2022","unstructured":"Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., & Kanazawa, A. (2022). Plenoxels: radiance fields without neural networks. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 5491\u20135500). Piscataway: IEEE."},{"issue":"4","key":"89_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3528223.3530127","volume":"41","author":"T. M\u00fcller","year":"2022","unstructured":"M\u00fcller, T., Evans, A., Schied, C., & Keller, A. (2022). Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 41(4), 1\u201315.","journal-title":"ACM Transactions on Graphics"},{"issue":"4","key":"89_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3592433","volume":"42","author":"B. Kerbl","year":"2023","unstructured":"Kerbl, B., Kopanas, G., Leimkuehler, T., & Drettakis, G. (2023). 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 1\u201314.","journal-title":"ACM Transactions on Graphics"},{"key":"89_CR34","doi-asserted-by":"publisher","first-page":"800","DOI":"10.1109\/3DV62453.2024.00044","volume-title":"Proceedings of the 2024 international conference on 3D vision","author":"J. Luiten","year":"2024","unstructured":"Luiten, J., Kopanas, G., Leibe, B., & Ramanan, D. (2024). Dynamic 3D Gaussians: tracking by persistent dynamic view synthesis. In Proceedings of the 2024 international conference on 3D vision (pp. 800\u2013809). Piscataway: IEEE."},{"key":"89_CR35","first-page":"1","volume-title":"Proceedings of the IEEE international conference on visual communications and image processing","author":"J. Meng","year":"2024","unstructured":"Meng, J., Li, H., Wu, Y., Gao, Q., Yang, S., Zhang, J., & Ma, S. (2024). Mirror-3DGS: incorporating mirror reflections into 3D Gaussian splatting. In Proceedings of the IEEE international conference on visual communications and image processing (pp. 1\u20135). Piscataway: IEEE."},{"key":"89_CR36","first-page":"1","volume-title":"Proceedings of the 12th international conference on learning representations","author":"Z. Yang","year":"2024","unstructured":"Yang, Z., Yang, H., Pan, Z., & Zhang, L. (2024). Real-time photorealistic dynamic scene representation and rendering with 4D Gaussian splatting. In Proceedings of the 12th international conference on learning representations (pp. 1\u201318). Retrieved September 14, 2025, from https:\/\/openreview.net\/pdf?id=WhgB5sispV."},{"key":"89_CR37","first-page":"6796","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"T. Yi","year":"2024","unstructured":"Yi, T., Fang, J., Wang, J., Wu, G., Xie, L., Zhang, X., Liu, W., Tian, Q., & Wang, X. (2024). GaussianDreamer: fast generation from text to 3D Gaussians by bridging 2D and 3D diffusion models. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 6796\u20136807). Piscataway: IEEE."},{"key":"89_CR38","first-page":"15162","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"A. Trevithick","year":"2021","unstructured":"Trevithick, A., & Yang, B. (2021). GRF: learning a general radiance field for 3D representation and rendering. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 15162\u201315172). Piscataway: IEEE."},{"key":"89_CR39","first-page":"4578","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"A. Yu","year":"2021","unstructured":"Yu, A., Ye, V., Tancik, M., & Kanazawa, A. (2021). pixelNeRF: neural radiance fields from one or few images. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 4578\u20134587). Piscataway: IEEE."},{"key":"89_CR40","first-page":"1526","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"S. Duggal","year":"2022","unstructured":"Duggal, S., & Pathak, D. (2022). Topologically-aware deformation fields for single-view 3D reconstruction. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 1526\u20131536). Piscataway: IEEE."},{"key":"89_CR41","first-page":"1","volume-title":"Proceedings of the 9th international conference on learning representations","author":"J. Song","year":"2021","unstructured":"Song, J., Meng, C., & Ermon, S. (2021). Denoising diffusion implicit models. In Proceedings of the 9th international conference on learning representations (pp. 1\u201320). Retrieved September 14, 2025, from https:\/\/openreview.net\/forum?id=St1giarCHLP."},{"key":"89_CR42","first-page":"1","volume-title":"Proceedings of the 36th international conference on neural information processing systems","author":"C. Saharia","year":"2022","unstructured":"Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E. L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T., et al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, & A. Oh (Eds.), Proceedings of the 36th international conference on neural information processing systems (pp. 1\u201316). Red Hook: Curran Associates."},{"key":"89_CR43","first-page":"9970","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"X. Long","year":"2024","unstructured":"Long, X., Guo, Y.-C., Lin, C., Liu, Y., Dou, Z., Liu, L., Ma, Y., Zhang, S.-H., Habermann, M., Theobalt, C., & Wang, W. (2024). Wonder3D: single image to 3D using cross-domain diffusion. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 9970\u20139980). Piscataway: IEEE."},{"key":"89_CR44","first-page":"8446","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"L. Melas-Kyriazi","year":"2023","unstructured":"Melas-Kyriazi, L., Laina, I., Rupprecht, C., & Vedaldi, A. (2023). Realfusion: 360\u2218 reconstruction of any object from a single image. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 8446\u20138455). Piscataway: IEEE."},{"key":"89_CR45","first-page":"9892","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"Z. Wu","year":"2024","unstructured":"Wu, Z., Zhou, P., Yi, X., Yuan, X., & Zhang, H. (2024). Consistent3D: towards consistent high-fidelity text-to-3D generation with deterministic sampling prior. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 9892\u20139902). Piscataway: IEEE."},{"key":"89_CR46","first-page":"1","volume-title":"Proceedings of the 12th international conference on learning representations","author":"J. Zhu","year":"2024","unstructured":"Zhu, J., Zhuang, P., & Koyejo, S. (2024). HIFA: high-fidelity text-to-3D generation with advanced diffusion guidance. In Proceedings of the 12th international conference on learning representations (pp. 1\u201325). Retrieved September 14, 2025, from https:\/\/openreview.net\/forum?id=IZMPWmcS3H."},{"key":"89_CR47","first-page":"1","volume-title":"Proceedings of the 18th European conference on computer vision","author":"J. Tang","year":"2024","unstructured":"Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., & Liu, Z. (2024). LGM: large multi-view Gaussian model for high-resolution 3D content creation. In A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, & G. Varol (Eds.), Proceedings of the 18th European conference on computer vision (pp. 1\u201318). Cham: Springer."},{"key":"89_CR48","first-page":"57","volume-title":"Proceedings of the 18th European conference on computer vision","author":"Z. Wang","year":"2024","unstructured":"Wang, Z., Wang, Y., Chen, Y., Xiang, C., Chen, S., Yu, D., Li, C., Su, H., & Zhu, J. (2024). CRM: single image to 3D textured mesh with convolutional reconstruction model. In A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, & G. Varol (Eds.), Proceedings of the 18th European conference on computer vision (pp. 57\u201374). Charm: Springer."},{"key":"89_CR49","unstructured":"Xu, J., Cheng, W., Gao, Y., Wang, X., Gao, S., & Shan, Y. (2024). Instantmesh: efficient 3D mesh generation from a single image with sparse-view large reconstruction models. arXiv preprint. arXiv:2404.07191."},{"issue":"6","key":"89_CR50","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3550454.3555497","volume":"41","author":"G. Kopanas","year":"2022","unstructured":"Kopanas, G., Leimk\u00fchler, T., Rainer, G., Jambon, C., & Drettakis, G. (2022). Neural point catacaustics for novel-view synthesis of reflections. ACM Transactions on Graphics, 41(6), 1\u201315.","journal-title":"ACM Transactions on Graphics"},{"key":"89_CR51","first-page":"8748","volume-title":"Proceedings of the international conference on machine learning","author":"A. Radford","year":"2021","unstructured":"Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning transferable visual models from natural language supervision. In Proceedings of the international conference on machine learning (pp. 8748\u20138763). Retrieved September 14, 2025, from http:\/\/proceedings.mlr.press\/v139\/radford21a.html."},{"key":"89_CR52","first-page":"1381","volume-title":"Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing","author":"M. Frigo","year":"1998","unstructured":"Frigo, M., & Johnson, S. G. (1998). FFTW: an adaptive software architecture for the FFT. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing (pp. 1381\u20131384). Piscataway: IEEE."},{"key":"89_CR53","doi-asserted-by":"publisher","first-page":"11941","DOI":"10.1007\/978-3-030-96530-3","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision","author":"X. Zhai","year":"2023","unstructured":"Zhai, X., Mustafa, B., Kolesnikov, A., & Beyer, L. (2023). Sigmoid loss for language image pre-training. In Proceedings of the IEEE\/CVF international conference on computer vision (pp. 11941\u201311952). Piscataway: IEEE."},{"key":"89_CR54","first-page":"13142","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"M. Deitke","year":"2023","unstructured":"Deitke, M., Schwenk, D., Salvador, J., Weihs, L., Michel, O., VanderBilt, E., Schmidt, L., Ehsani, K., Kembhavi, A., & Farhadi, A. (2023). Objaverse: a universe of annotated 3D objects. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 13142\u201313153). Piscataway: IEEE."},{"key":"89_CR55","first-page":"803","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"T. Wu","year":"2023","unstructured":"Wu, T., Zhang, J., Fu, X., Wang, Y., Ren, J., Pan, L., Wu, W., Yang, L., Wang, J., Qian, C., et al. (2023). Omniobject3D: large-vocabulary 3D object dataset for realistic perception, reconstruction and generation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 803\u2013814). Piscataway: IEEE."},{"key":"89_CR56","first-page":"2553","volume-title":"Proceedings of the international conference on robotics and automation","author":"L. Downs","year":"2022","unstructured":"Downs, L., Francis, A., Koenig, N., Kinman, B., Hickman, R., Reymann, K., McHugh, T. B., & Vanhoucke, V. (2022). Google scanned objects: a high-quality dataset of 3D scanned household items. In Proceedings of the international conference on robotics and automation (pp. 2553\u20132560). Piscataway: IEEE."},{"issue":"4","key":"89_CR57","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1109\/TIP.2003.819861","volume":"13","author":"Z. Wang","year":"2004","unstructured":"Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600\u2013612.","journal-title":"IEEE Transactions on Image Processing"},{"key":"89_CR58","first-page":"586","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"R. Zhang","year":"2018","unstructured":"Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 586\u2013595). Piscataway: IEEE."},{"key":"89_CR59","first-page":"6626","volume-title":"Proceedings of the 31st international conference on neural information processing systems","author":"M. Heusel","year":"2017","unstructured":"Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local Nash equilibrium. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Proceedings of the 31st international conference on neural information processing systems (pp. 6626\u20136637). Red Hook: Curran Associates."},{"key":"89_CR60","unstructured":"Shi, R., Chen, H., Zhang, Z., Liu, M., Xu, C., Wei, X., Chen, L., Zeng, C., & Su, H. (2023). Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint. arXiv:2310.15110."},{"key":"89_CR61","first-page":"21469","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition","author":"J. Xiang","year":"2025","unstructured":"Xiang, J., Lv, Z., Xu, S., Deng, Y., Wang, R., Zhang, B., Chen, D., Tong, X., & Yang, J. (2025). Structured 3D latents for scalable and versatile 3D generation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 21469\u201321480). Piscataway: IEEE."}],"container-title":["Visual Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44267-025-00089-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s44267-025-00089-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s44267-025-00089-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T09:04:09Z","timestamp":1759223049000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s44267-025-00089-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,30]]},"references-count":61,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["89"],"URL":"https:\/\/doi.org\/10.1007\/s44267-025-00089-8","relation":{},"ISSN":["2097-3330","2731-9008"],"issn-type":[{"value":"2097-3330","type":"print"},{"value":"2731-9008","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,30]]},"assertion":[{"value":"3 March 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 September 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 September 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 September 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"17"}}