{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T01:05:37Z","timestamp":1777597537952,"version":"3.51.4"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2019,11,8]],"date-time":"2019-11-08T00:00:00Z","timestamp":1573171200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Max-Planck-Gesellschaft","award":["MPC-VCC"],"award-info":[{"award-number":["MPC-VCC"]}]},{"DOI":"10.13039\/100011199","name":"European Research Council","doi-asserted-by":"publisher","award":["770784"],"award-info":[{"award-number":["770784"]}],"id":[{"id":"10.13039\/100011199","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/S001050\/1"],"award-info":[{"award-number":["EP\/S001050\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000690","name":"Research Councils UK","doi-asserted-by":"publisher","award":["EP\/M023281\/1"],"award-info":[{"award-number":["EP\/M023281\/1"]}],"id":[{"id":"10.13039\/501100000690","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2019,12,31]]},"abstract":"<jats:p>Dubbing is a technique for translating video content from one language to another. However, state-of-the-art visual dubbing techniques directly copy facial expressions from source to target actors without considering identity-specific idiosyncrasies such as a unique type of smile. We present a style-preserving visual dubbing approach from single video inputs, which maintains the signature style of target actors when modifying facial expressions, including mouth motions, to match foreign languages. At the heart of our approach is the concept of motion style, in particular for facial expressions, i.e., the person-specific expression change that is yet another essential factor beyond visual accuracy in face editing applications. Our method is based on a recurrent generative adversarial network that captures the spatiotemporal co-activation of facial expressions, and enables generating and modifying the facial expressions of the target actor while preserving their style. We train our model with unsynchronized source and target videos in an unsupervised manner using cycle-consistency and mouth expression losses, and synthesize photorealistic video frames using a layered neural face renderer. Our approach generates temporally coherent results, and handles dynamic backgrounds. Our results show that our dubbing approach maintains the idiosyncratic style of the target actor better than previous approaches, even for widely differing source and target actors.<\/jats:p>","DOI":"10.1145\/3355089.3356500","type":"journal-article","created":{"date-parts":[[2019,11,8]],"date-time":"2019-11-08T20:27:58Z","timestamp":1573244878000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":62,"title":["Neural style-preserving visual dubbing"],"prefix":"10.1145","volume":"38","author":[{"given":"Hyeongwoo","family":"Kim","sequence":"first","affiliation":[{"name":"Max Planck Institute for Informatics"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohamed","family":"Elgharib","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Zollh\u00f6fer","sequence":"additional","affiliation":[{"name":"Stanford University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hans-Peter","family":"Seidel","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thabo","family":"Beeler","sequence":"additional","affiliation":[{"name":"DisneyResearch|Studios"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christian","family":"Richardt","sequence":"additional","affiliation":[{"name":"University of Bath"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christian","family":"Theobalt","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Informatics"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,11,8]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https:\/\/www.tensorflow.org\/ Software available from tensorflow.org."},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2010.65"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1111\/1467-8659.t01-1-00712"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","unstructured":"Volker Blanz and Thomas Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In SIGGRAPH. 187--194. 10.1145\/311535.311556","DOI":"10.1145\/311535.311556"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","unstructured":"Matthew Brand. 1999. Voice Puppetry. In SIGGRAPH. 21--28. 10.1145\/311535.311537","DOI":"10.1145\/311535.311537"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2013.249"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2868527"},{"key":"e_1_2_2_8_1","volume-title":"British Machine Vision Conference (BMVC).","author":"Chung Joon Son","year":"2017","unstructured":"Joon Son Chung, Amir Jamaludin, and Andrew Zisserman. 2017. You said that?. In British Machine Vision Conference (BMVC)."},{"key":"e_1_2_2_9_1","volume-title":"Symposium on Computer Animation (SCA). 251--260","author":"Deng Zhigang","year":"2006","unstructured":"Zhigang Deng and Ulrich Neumann. 2006. eFASE: Expressive Facial Animation Synthesis and Editing with Phoneme-isomap Controls. In Symposium on Computer Animation (SCA). 251--260."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.12552"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2890493"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275043"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.632"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073658"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201283"},{"key":"e_1_2_2_18_1","volume-title":"International Conference on Machine Learning (ICML). https:\/\/arxiv.org\/abs\/1703","author":"Kim Taeksoo","year":"2017","unstructured":"Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks. In International Conference on Machine Learning (ICML). https:\/\/arxiv.org\/abs\/1703.05192"},{"key":"e_1_2_2_19_1","volume-title":"Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR).","author":"Diederik","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1111\/1467-8659.t01-2-00711"},{"key":"e_1_2_2_21_1","unstructured":"Bertrand Le Goff Thierry Guiard-Marigny Michael M. Cohen and Christian Beno\u00eet. 1994. Real-time analysis-synthesis and intelligibility of talking faces. In SSW. https:\/\/www.isca-speech.org\/archive_open\/ssw2\/ssw2_053.html"},{"key":"e_1_2_2_22_1","unstructured":"Ming-Yu Liu Thomas Breuel and Jan Kautz. 2017. Unsupervised Image-to-Image Translation Networks. In Advances in Neural Information Processing Systems (NIPS). https:\/\/github.com\/mingyuliutw\/unit"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2006.18"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13586"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275075"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1044\/jshr.2803.381"},{"key":"e_1_2_2_27_1","volume-title":"International Conference on Machine Learning (ICML). https:\/\/arxiv.org\/abs\/1211","author":"Pascanu Razvan","year":"2013","unstructured":"Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training Recurrent Neural Networks. In International Conference on Machine Learning (ICML). https:\/\/arxiv.org\/abs\/1211.5063"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.287"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/280814.280825"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-010-0380-4"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073640"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073699"},{"key":"e_1_2_2_33_1","volume-title":"Dynamic Units of Visual Speech. In Symposium on Computer Animation (SCA). 275--284","author":"Taylor Sarah L.","year":"2012","unstructured":"Sarah L. Taylor, Moshe Mahler, Barry-John Theobald, and Iain Matthews. 2012. Dynamic Units of Visual Speech. In Symposium on Computer Animation (SCA). 275--284."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2929464.2929475"},{"key":"e_1_2_2_35_1","volume-title":"End-to-End Speech-Driven Facial Animation with Temporal GANs. In British Machine Vision Conference (BMVC).","author":"Vougioukas Konstantinos","year":"2018","unstructured":"Konstantinos Vougioukas, Stavros Petridis, and Maja Pantic. 2018. End-to-End Speech-Driven Facial Animation with Temporal GANs. In British Machine Vision Conference (BMVC)."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_41"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.310"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.244"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3355089.3356500","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3355089.3356500","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:41Z","timestamp":1750203881000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3355089.3356500"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,8]]},"references-count":38,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2019,12,31]]}},"alternative-id":["10.1145\/3355089.3356500"],"URL":"https:\/\/doi.org\/10.1145\/3355089.3356500","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,8]]},"assertion":[{"value":"2019-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}