{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T03:37:27Z","timestamp":1774496247368,"version":"3.50.1"},"reference-count":102,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,12,16]],"date-time":"2024-12-16T00:00:00Z","timestamp":1734307200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Synthetic Data Generation and Sim-to-Real Adaptive Learning for Real-World Human Daily Activity Recognition of Human-Care Robots","award":["TOPACS ANR-19-CE45-0015"],"award-info":[{"award-number":["TOPACS ANR-19-CE45-0015"]}]},{"DOI":"10.13039\/501100001665","name":"French National Research Agency","doi-asserted-by":"crossref","award":["Human4D ANR-19-CE23-0020"],"award-info":[{"award-number":["Human4D ANR-19-CE23-0020"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,1,31]]},"abstract":"<jats:p>\n            Facial expression generation is one of the most challenging and long-sought aspects of character animation, with many interesting applications. The challenging task, traditionally having relied heavily on digital craftspersons, remains yet to be explored. In this article, we introduce a generative framework for generating 3D facial expression sequences (i.e., 4D faces) that can be conditioned on different inputs to animate an arbitrary 3D face mesh. It is composed of two tasks: (1) learning the generative model that is trained over a set of 3D landmark sequences and (2) generating 3D mesh sequences of an input facial mesh driven by the generated landmark sequences. The generative model is based on a Denoising Diffusion Probabilistic Model (DDPM), which has achieved remarkable success in generative tasks of other domains. While it can be trained unconditionally, its reverse process can still be conditioned by various condition signals. This allows us to efficiently develop several downstream tasks involving various conditional generations, by using expression labels, text, partial sequences, or simply a facial geometry. To obtain the full mesh deformation, we then develop a landmark-guided encoder-decoder to apply the geometrical deformation embedded in landmarks on a given facial mesh. Experiments show that our model has learned to generate realistic, quality expressions solely from the dataset of relatively small size, improving over the state-of-the-art methods. Videos and qualitative comparisons with other methods can be found at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/ZOUKaifeng\/4DFM\">https:\/\/github.com\/ZOUKaifeng\/4DFM<\/jats:ext-link>\n            . Code and models will be made available upon acceptance.\n          <\/jats:p>","DOI":"10.1145\/3653455","type":"journal-article","created":{"date-parts":[[2024,3,28]],"date-time":"2024-03-28T13:13:12Z","timestamp":1711631592000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["4D Facial Expression Diffusion Model"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4460-3690","authenticated-orcid":false,"given":"Kaifeng","family":"Zou","sequence":"first","affiliation":[{"name":"ICube laboratory, CNRS-University of Strasbourg, Illkirch, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3763-9425","authenticated-orcid":false,"given":"Sylvain","family":"Faisan","sequence":"additional","affiliation":[{"name":"ICube laboratory, CNRS-University of Strasbourg, Illkirch, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0934-610X","authenticated-orcid":false,"given":"Boyang","family":"Yu","sequence":"additional","affiliation":[{"name":"ICube laboratory, CNRS-University of Strasbourg, Strasbourg, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7549-4808","authenticated-orcid":false,"given":"Sebastien","family":"Valette","sequence":"additional","affiliation":[{"name":"Centre de Recherche en Acquisition et Traitement de l'Image pour la Sante (CREATIS), Villeurbanne, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8851-0256","authenticated-orcid":false,"given":"Hyewon","family":"Seo","sequence":"additional","affiliation":[{"name":"ICube laboratory, CNRS-University of Strasbourg, Strasbourg, France"}]}],"member":"320","published-online":{"date-parts":[[2024,12,16]]},"reference":[{"key":"e_1_3_3_2_2","article-title":"Diffusion-based time series imputation and forecasting with structured state space models","author":"Alcaraz Juan Miguel Lopez","year":"2022","unstructured":"Juan Miguel Lopez Alcaraz and Nils Strodthoff. 2022. Diffusion-based time series imputation and forecasting with structured state space models. arXiv preprint arXiv:2208.09399 (2022).","journal-title":"arXiv preprint arXiv:2208.09399"},{"key":"e_1_3_3_3_2","first-page":"17981","article-title":"Structured denoising diffusion models in discrete state-spaces","volume":"34","author":"Austin Jacob","year":"2021","unstructured":"Jacob Austin, Daniel D. Johnson, Jonathan Ho, Daniel Tarlow, and Rianne van den Berg. 2021. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems 34 (2021), 17981\u201317993.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.299"},{"key":"e_1_3_3_5_2","article-title":"Label-efficient semantic segmentation with diffusion models","author":"Baranchuk Dmitry","year":"2021","unstructured":"Dmitry Baranchuk, Ivan Rubachev, Andrey Voynov, Valentin Khrulkov, and Artem Babenko. 2021. Label-efficient semantic segmentation with diffusion models. arXiv preprint arXiv:2112.03126 (2021).","journal-title":"arXiv preprint arXiv:2112.03126"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964970"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1111\/1467-8659.t01-1-00712"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00731"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.iswa.2022.200139"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW56347.2022.00462"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3203187"},{"key":"e_1_3_3_12_2","article-title":"Analog bits: Generating discrete data using diffusion models with self-conditioning","author":"Chen Ting","year":"2022","unstructured":"Ting Chen, Ruixiang Zhang, and Geoffrey Hinton. 2022. Analog bits: Generating discrete data using diffusion models with self-conditioning. arXiv preprint arXiv:2208.04202 (2022).","journal-title":"arXiv preprint arXiv:2208.04202"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00537"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00581"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126510"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126510"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.1996.517079"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00038"},{"key":"e_1_3_3_20_2","article-title":"Bert: Pre-training of deep bidirectional transformers for language understanding","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).","journal-title":"arXiv preprint arXiv:1810.04805"},{"key":"e_1_3_3_21_2","first-page":"8780","article-title":"Diffusion models beat GANs on image synthesis","volume":"34","author":"Dhariwal Prafulla","year":"2021","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems 34 (2021), 8780\u20138794.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_22_2","article-title":"Continuous diffusion for categorical data","author":"Dieleman Sander","year":"2022","unstructured":"Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, R\u00e9mi Leblond, Will Grathwohl, and Jonas Adler. 2022. Continuous diffusion for categorical data. arXiv preprint arXiv:2211.15089 (2022).","journal-title":"arXiv preprint arXiv:2211.15089"},{"key":"e_1_3_3_23_2","article-title":"Nice: Non-linear independent components estimation","author":"Dinh Laurent","year":"2014","unstructured":"Laurent Dinh, David Krueger, and Yoshua Bengio. 2014. Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516 (2014).","journal-title":"arXiv preprint arXiv:1410.8516"},{"key":"e_1_3_3_24_2","article-title":"Adaptive subgradient methods for online learning and stochastic optimization.","author":"Duchi John","year":"2011","unstructured":"John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, 61 (2011), 2121\u20132159.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2010.2052239"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_33"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2707341"},{"key":"e_1_3_3_28_2","article-title":"Diffuseq: Sequence to sequence text generation with diffusion models","author":"Gong Shansan","year":"2022","unstructured":"Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and LingPeng Kong. 2022. Diffuseq: Sequence to sequence text generation with diffusion models. arXiv preprint arXiv:2210.08933 (2022).","journal-title":"arXiv preprint arXiv:2210.08933"},{"key":"e_1_3_3_29_2","first-page":"2672","volume-title":"Advances in Neural Information Processing Systems","author":"Goodfellow Ian","year":"2014","unstructured":"Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672\u20132680."},{"key":"e_1_3_3_30_2","article-title":"Diffusion models as plug-and-play priors","author":"Graikos Alexandros","year":"2022","unstructured":"Alexandros Graikos, Nikolay Malkin, Nebojsa Jojic, and Dimitris Samaras. 2022. Diffusion models as plug-and-play priors. arXiv preprint arXiv:2206.09012 (2022).","journal-title":"arXiv preprint arXiv:2206.09012"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413635"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58529-7_10"},{"key":"e_1_3_3_33_2","article-title":"Flexible diffusion modeling of long videos","author":"Harvey William","year":"2022","unstructured":"William Harvey, Saeid Naderiparizi, Vaden Masrani, Christian Weilbach, and Frank Wood. 2022. Flexible diffusion modeling of long videos. arXiv preprint arXiv:2205.11495 (2022).","journal-title":"arXiv preprint arXiv:2205.11495"},{"key":"e_1_3_3_34_2","article-title":"Prompt-to-prompt image editing with cross attention control","author":"Hertz Amir","year":"2022","unstructured":"Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022).","journal-title":"arXiv preprint arXiv:2208.01626"},{"key":"e_1_3_3_35_2","article-title":"GANs trained by a two time-scale update rule converge to a local Nash equilibrium","volume":"30","author":"Heusel Martin","year":"2017","unstructured":"Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems 30 (2017).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_36_2","article-title":"Imagen video: High definition video generation with diffusion models","author":"Ho Jonathan","year":"2022","unstructured":"Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P. Kingma, Ben Poole, Mohammad Norouzi, David J. Fleet, et\u00a0al. 2022. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022).","journal-title":"arXiv preprint arXiv:2210.02303"},{"key":"e_1_3_3_37_2","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840\u20136851.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"47","key":"e_1_3_3_38_2","first-page":"1","article-title":"Cascaded diffusion models for high fidelity image generation.","volume":"23","author":"Ho Jonathan","year":"2022","unstructured":"Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, and Tim Salimans. 2022. Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research 23, 47 (2022), 1\u201333.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_39_2","article-title":"Video diffusion models","author":"Ho Jonathan","year":"2022","unstructured":"Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. 2022. Video diffusion models. arXiv preprint arXiv:2204.03458 (2022).","journal-title":"arXiv preprint arXiv:2204.03458"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01223"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201283"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i7.25996"},{"key":"e_1_3_3_45_2","first-page":"3581","volume-title":"Advances in Neural Information Processing Systems","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems. 3581\u20133589."},{"key":"e_1_3_3_46_2","volume-title":"International Conference on Learning Representations","author":"Kingma Diederik P.","year":"2014","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1312.6114"},{"key":"e_1_3_3_47_2","article-title":"Diffwave: A versatile diffusion model for audio synthesis","author":"Kong Zhifeng","year":"2020","unstructured":"Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. 2020. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761 (2020).","journal-title":"arXiv preprint arXiv:2009.09761"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i2.20014"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.01.029"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01315"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130813"},{"key":"e_1_3_3_52_2","article-title":"Diffusion-LM improves controllable text generation","author":"Li Xiang Lisa","year":"2022","unstructured":"Xiang Lisa Li, John Thickstun, Ishaan Gulrajani, Percy Liang, and Tatsunori B. Hashimoto. 2022. Diffusion-LM improves controllable text generation. arXiv preprint arXiv:2205.14217 (2022).","journal-title":"arXiv preprint arXiv:2205.14217"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818013"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01117"},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00286"},{"key":"e_1_3_3_56_2","article-title":"A conditional point diffusion-refinement paradigm for 3d point cloud completion","author":"Lyu Zhaoyang","year":"2021","unstructured":"Zhaoyang Lyu, Zhifeng Kong, Xudong Xu, Liang Pan, and Dahua Lin. 2021. A conditional point diffusion-refinement paradigm for 3d point cloud completion. arXiv preprint arXiv:2112.03530 (2021).","journal-title":"arXiv preprint arXiv:2112.03530"},{"key":"e_1_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00554"},{"key":"e_1_3_3_58_2","article-title":"Conditional generative adversarial nets","author":"Mirza Mehdi","year":"2014","unstructured":"Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).","journal-title":"arXiv preprint arXiv:1411.1784"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.374"},{"key":"e_1_3_3_60_2","first-page":"8162","volume-title":"International Conference on Machine Learning","author":"Nichol Alexander Quinn","year":"2021","unstructured":"Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning. PMLR, 8162\u20138171."},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/502390.502422"},{"key":"e_1_3_3_62_2","article-title":"Dynamic facial expression generation on Hilbert hypersphere with conditional Wasserstein generative adversarial nets","author":"Otberdout Naima","year":"2020","unstructured":"Naima Otberdout, Mohammed Daoudi, Anis Kacem, Lahoucine Ballihi, and Stefano Berretti. 2020. Dynamic facial expression generation on Hilbert hypersphere with conditional Wasserstein generative adversarial nets. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2 (2020), 848\u2013863.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01974"},{"key":"e_1_3_3_64_2","article-title":"DiffuseVAE: Efficient, controllable and high-fidelity generation from low-dimensional latents","author":"Pandey Kushagra","year":"2022","unstructured":"Kushagra Pandey, Avideep Mukherjee, Piyush Rai, and Abhishek Kumar. 2022. DiffuseVAE: Efficient, controllable and high-fidelity generation from low-dimensional latents. arXiv preprint arXiv:2201.00308 (2022).","journal-title":"arXiv preprint arXiv:2201.00308"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01080"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.1999.791210"},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.1089\/big.2016.0028"},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58526-6_17"},{"key":"e_1_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01036"},{"key":"e_1_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01249-6_50"},{"key":"e_1_3_3_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00078"},{"key":"e_1_3_3_72_2","first-page":"8748","volume-title":"International Conference on Machine Learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748\u20138763."},{"key":"e_1_3_3_73_2","article-title":"Hierarchical text-conditional image generation with clip latents","author":"Ramesh Aditya","year":"2022","unstructured":"Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).","journal-title":"arXiv preprint arXiv:2204.06125"},{"key":"e_1_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01219-9_43"},{"key":"e_1_3_3_75_2","first-page":"1530","volume-title":"International Conference on Machine Learning","author":"Rezende Danilo","year":"2015","unstructured":"Danilo Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. In International Conference on Machine Learning. PMLR, 1530\u20131538."},{"key":"e_1_3_3_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_3_77_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_3_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530757"},{"key":"e_1_3_3_79_2","article-title":"Photorealistic text-to-image diffusion models with deep language understanding","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, et\u00a0al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022).","journal-title":"arXiv preprint arXiv:2205.11487"},{"key":"e_1_3_3_80_2","article-title":"Image super-resolution via iterative refinement","author":"Saharia Chitwan","year":"2023","unstructured":"Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J. Fleet, and Mohammad Norouzi. 2023. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 4 (2023), 4713\u20134726.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_3_81_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-71002-6_11"},{"key":"e_1_3_3_82_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00926"},{"key":"e_1_3_3_83_2","first-page":"2256","volume-title":"International Conference on Machine Learning","author":"Sohl-Dickstein Jascha","year":"2015","unstructured":"Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256\u20132265."},{"key":"e_1_3_3_84_2","first-page":"3483","article-title":"Learning structured output representation using deep conditional generative models","volume":"28","author":"Sohn Kihyuk","year":"2015","unstructured":"Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. Advances in Neural Information Processing Systems 28 (2015), 3483\u20133491.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_85_2","article-title":"Denoising diffusion implicit models","author":"Song Jiaming","year":"2020","unstructured":"Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).","journal-title":"arXiv preprint arXiv:2010.02502"},{"key":"e_1_3_3_86_2","article-title":"Score-based generative modeling through stochastic differential equations","author":"Song Yang","year":"2020","unstructured":"Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020).","journal-title":"arXiv preprint arXiv:2011.13456"},{"key":"e_1_3_3_87_2","article-title":"MotionCLIP: Exposing human motion generation to CLIP space","author":"Tevet Guy","year":"2022","unstructured":"Guy Tevet, Brian Gordon, Amir Hertz, Amit H. Bermano, and Daniel Cohen-Or. 2022. MotionCLIP: Exposing human motion generation to CLIP space. arXiv preprint arXiv:2203.08063 (2022).","journal-title":"arXiv preprint arXiv:2203.08063"},{"key":"e_1_3_3_88_2","volume-title":"The 11th International Conference on Learning Representations","author":"Tevet Guy","year":"2023","unstructured":"Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. 2023. Human motion diffusion model. In The 11th International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=SJ1kSyO2jwu"},{"key":"e_1_3_3_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2993962"},{"key":"e_1_3_3_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00165"},{"key":"e_1_3_3_91_2","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_3_92_2","doi-asserted-by":"publisher","DOI":"10.1145\/1185657.1185864"},{"key":"e_1_3_3_93_2","article-title":"EDICT: Exact diffusion inversion via coupled transformations","author":"Wallace Bram","year":"2022","unstructured":"Bram Wallace, Akash Gokul, and Nikhil Naik. 2022. EDICT: Exact diffusion inversion via coupled transformations. arXiv preprint arXiv:2211.12446 (2022).","journal-title":"arXiv preprint arXiv:2211.12446"},{"key":"e_1_3_3_94_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00740"},{"key":"e_1_3_3_95_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00531"},{"key":"e_1_3_3_96_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093492"},{"key":"e_1_3_3_97_2","article-title":"Diffusion-GAN: Training GANs with diffusion","author":"Wang Zhendong","year":"2022","unstructured":"Zhendong Wang, Huangjie Zheng, Pengcheng He, Weizhu Chen, and Mingyuan Zhou. 2022. Diffusion-GAN: Training GANs with diffusion. arXiv preprint arXiv:2206.02262 (2022).","journal-title":"arXiv preprint arXiv:2206.02262"},{"key":"e_1_3_3_98_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2962317"},{"key":"e_1_3_3_99_2","article-title":"Diffusion probabilistic modeling for video generation","author":"Yang Ruihan","year":"2022","unstructured":"Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. 2022. Diffusion probabilistic modeling for video generation. arXiv preprint arXiv:2203.09481 (2022).","journal-title":"arXiv preprint arXiv:2203.09481"},{"key":"e_1_3_3_100_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2020.3013876"},{"key":"e_1_3_3_101_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3355414"},{"key":"e_1_3_3_102_2","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2013.6553788"},{"key":"e_1_3_3_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00577"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3653455","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3653455","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T23:44:26Z","timestamp":1750290266000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3653455"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,16]]},"references-count":102,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1,31]]}},"alternative-id":["10.1145\/3653455"],"URL":"https:\/\/doi.org\/10.1145\/3653455","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,16]]},"assertion":[{"value":"2023-03-30","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}