{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T14:24:14Z","timestamp":1762525454526,"version":"build-2065373602"},"reference-count":65,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"name":"Institute of Information & Communications Technology Planning & Evaluatio","award":["RS-2024-00439499"],"award-info":[{"award-number":["RS-2024-00439499"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>Manipulating the emotion of a performer in a video is a challenging task. The lip motion needs to be preserved while performing the desired changes in the emotion of the subject; however, simply utilizing existing image-based editing methods sabotages the original lip synchronization. We tackle this problem by utilizing a pretrained StyleGAN paired with a landmark-based editing module that modifies the bias present in the edit direction used in image manipulation. The proposed editing module consists of a latent-based landmark detection network and an editing network that modifies the editing direction to match the original lip synchronization while preserving the desired emotion manipulation results. This is realized by taking the facial landmarks as control points. Both networks operate on the latent space, which enables fast training and inference. We show that the proposed method runs significantly faster and performs better in terms of visual quality than alternative approaches, which was validated through a perceptual study. The proposed method can also be extended to perform face reenactment to generate a talking-head video from a single image and face image manipulation using facial landmarks as control points.<\/jats:p>","DOI":"10.1145\/3770576","type":"journal-article","created":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T11:36:04Z","timestamp":1760355364000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Emotion Manipulation for Talking-Head Videos via Facial Landmarks"],"prefix":"10.1145","volume":"45","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0570-4915","authenticated-orcid":false,"given":"Kwanggyoon","family":"Seo","sequence":"first","affiliation":[{"name":"Graduate School of Culture Technology, KAIST","place":["Daejeon, Korea (the Republic of)"]},{"name":"Flawless AI","place":["Daejeon, Korea (the Republic of)"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9050-0862","authenticated-orcid":false,"given":"Rene","family":"Culaway","sequence":"additional","affiliation":[{"name":"KAIST","place":["Daejeon, Korea (the Republic of)"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2664-9863","authenticated-orcid":false,"given":"Byeong-Uk","family":"Lee","sequence":"additional","affiliation":[{"name":"KRAFTON, Inc.","place":["Seoul, Korea (the Republic of)"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1925-3326","authenticated-orcid":false,"given":"Junyong","family":"Noh","sequence":"additional","affiliation":[{"name":"Graduate School of Culture Technology, KAIST","place":["Daejeon, Korea (the Republic of)"]}]}],"member":"320","published-online":{"date-parts":[[2025,11,7]]},"reference":[{"key":"e_1_3_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00832"},{"key":"e_1_3_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447648"},{"key":"e_1_3_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2003.1227983"},{"key":"e_1_3_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.116"},{"key":"e_1_3_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02121"},{"key":"e_1_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01565"},{"key":"e_1_3_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00482"},{"key":"e_1_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00038"},{"key":"e_1_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14686"},{"key":"e_1_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00432"},{"key":"e_1_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02069"},{"key":"e_1_3_2_13_1","unstructured":"Jianzhu Guo Dingyun Zhang Xiaoqiang Liu Zhizhou Zhong Yuan Zhang Pengfei Wan and Di Zhang. 2024. LivePortrait: Efficient portrait animation with stitching and retargeting control. arXiv:2407.03168. Retrieved from https:\/\/arxiv.org\/abs\/2407.03168 (2024)."},{"key":"e_1_3_2_14_1","doi-asserted-by":"crossref","unstructured":"Jianzhu Guo Xiangyu Zhu Yang Yang Fan Yang Zhen Lei and Stan Z. Li. 2020. Towards fast accurate and stable 3D dense face alignment. In Proceedings of the European European Conference on Computer Vision. Springer 152\u2013168.","DOI":"10.1007\/978-3-030-58529-7_10"},{"key":"e_1_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01912"},{"volume-title":"Proceedings of the Advances in Neural Information Processing Systems","year":"2020","key":"e_1_3_2_16_1","unstructured":"H\u00e4rk\u00f6nen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering interpretable GAN controls. In Proceedings of the Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530745"},{"key":"e_1_3_2_18_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Ji Xinya","year":"2020","unstructured":"Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, and Feng Xu. 2020. Audio-driven emotional video portraits. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_19_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Karras Tero","year":"2021","unstructured":"Tero Karras, Miika Aittala, Samuli Laine, Erik H\u00e4rk\u00f6nen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. In Proceedings of the Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00453"},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00813"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00590"},{"key":"e_1_3_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356500"},{"key":"e_1_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201283"},{"key":"e_1_3_2_25_1","unstructured":"Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 60 (2009) 1755\u20131758."},{"key":"e_1_3_2_26_1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3680528.3687686"},{"key":"e_1_3_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130813"},{"key":"e_1_3_2_29_1","unstructured":"Gaojie Lin Jianwen Jiang Jiaqi Yang Zerong Zheng Chao Liang Yuan Zhang and Jingtuo Liu. 2025. Omnihuman-1: Rethinking the scaling-up of one-stage conditioned human animation models. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 13847\u20131385."},{"key":"e_1_3_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530056"},{"key":"e_1_3_2_31_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0196391"},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8803603"},{"key":"e_1_3_2_33_1","doi-asserted-by":"crossref","unstructured":"Yotam Nitzan Kfir Aberman Qiurui He Orly Liba Michal Yarom Yossi Gandelsman Inbar Mosseri Yael Pritch and Daniel Cohen-Or. 2022. Mystyle: A personalized generative prior. ACM Transactions on Graphics 41 6 (2022) 1\u201310.","DOI":"10.1145\/3550454.3555436"},{"key":"e_1_3_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591500"},{"key":"e_1_3_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01822"},{"key":"e_1_3_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00209"},{"key":"e_1_3_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00232"},{"key":"e_1_3_2_38_1","doi-asserted-by":"crossref","unstructured":"Daniel Roich Ron Mokady Amit H. Bermano and Daniel Cohen-Or. 2022. Pivotal tuning for latent-based editing of real images. ACM Transactions on Graphics 42 1 (2022) 1\u201313.","DOI":"10.1145\/3544777"},{"key":"e_1_3_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2013.59"},{"key":"e_1_3_2_41_1","doi-asserted-by":"crossref","unstructured":"Kwanggyoon Seo Seoung Wug Oh Jingwan Lu Joon-Young Lee Seonghyeon Kim and Junyong Noh. 2022. StylePortraitVideo: Editing portrait videos with expression optimization. Computer Graphics Forum 41 7 (2022) 165\u2013175.","DOI":"10.1111\/cgf.14666"},{"key":"e_1_3_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00926"},{"key":"e_1_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00158"},{"key":"e_1_3_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00844"},{"key":"e_1_3_2_45_1","first-page":"104","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Solanki Girish Kumar","year":"2022","unstructured":"Girish Kumar Solanki and Anastasios Roussos. 2022. Deep semantic manipulation of facial videos. In Proceedings of the European Conference on Computer Vision. 104\u2013120."},{"key":"e_1_3_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2023.3334511"},{"key":"e_1_3_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00618"},{"key":"e_1_3_2_48_1","unstructured":"Linrui Tian Siqi Hu Qi Wang Bang Zhang and Liefeng Bo. 2025. Emo2: End-effector guided audio-driven avatar video generation. arXiv:2501.10687. Retrieved from https:\/\/arxiv.org\/abs\/2501.10687 (2025)."},{"key":"e_1_3_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459838"},{"key":"e_1_3_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550469.3555382"},{"key":"e_1_3_2_51_1","doi-asserted-by":"crossref","unstructured":"Jingdong Wang Ke Sun Tianheng Cheng Borui Jiang Chaorui Deng Yang Zhao Dong Liu Yadong Mu Mingkui Tan Xinggang Wang et\u00a0al. 2020a. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 10 (2020a) 3349\u20133364.","DOI":"10.1109\/TPAMI.2020.2983686"},{"key":"e_1_3_2_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58589-1_42"},{"key":"e_1_3_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00991"},{"key":"e_1_3_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00701"},{"key":"e_1_3_2_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19784-0_21"},{"key":"e_1_3_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01365"},{"key":"e_1_3_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00441"},{"key":"e_1_3_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01920"},{"key":"e_1_3_2_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3610548.3618160"},{"key":"e_1_3_2_60_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Yao Xu","year":"2021","unstructured":"Xu Yao, Alasdair Newson, Yann Gousseau, and Pierre Hellier. 2021. A latent transformer for disentangled and identity-preserving face editing. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)."},{"key":"e_1_3_2_61_1","unstructured":"Zipeng Ye Zhiyao Sun Yu-Hui Wen Yanan Sun Tian Lv Ran Yi and Yong-Jin Liu. 2022. Dynamic neural textures: Generating talking-face videos with continuously controllable expressions. arXiv:2204.06180. Retrieved from https:\/\/arxiv.org\/abs\/2204.06180 (2022)."},{"key":"e_1_3_2_62_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"e_1_3_2_63_1","first-page":"192","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Zhang Shifeng","year":"2017","unstructured":"Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, and Stan Z Li. 2017. S3fd: Single shot scale-invariant face detector. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 192\u2013201."},{"key":"e_1_3_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00366"},{"key":"e_1_3_2_65_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Ji. Zhenglin Zhou, Huaxia Li, Hong Liu, Nanyang Wang, Gang Yu, and Rongrong","year":"2023","unstructured":"Zhenglin Zhou, Huaxia Li, Hong Liu, Nanyang Wang, Gang Yu, and Rongrong Ji.2023. STAR loss: Reducing semantic ambiguity in facial landmark detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555520"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3770576","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T14:20:08Z","timestamp":1762525208000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3770576"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,7]]},"references-count":65,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3770576"],"URL":"https:\/\/doi.org\/10.1145\/3770576","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"type":"print","value":"0730-0301"},{"type":"electronic","value":"1557-7368"}],"subject":[],"published":{"date-parts":[[2025,11,7]]},"assertion":[{"value":"2024-07-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}