{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T21:55:48Z","timestamp":1776117348136,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":70,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,8,7]],"date-time":"2022-08-07T00:00:00Z","timestamp":1659830400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Beijing Natural Science Foundation","award":["JQ19015"],"award-info":[{"award-number":["JQ19015"]}]},{"name":"NSFC","award":["62025108,62021002, 61727808"],"award-info":[{"award-number":["62025108,62021002, 61727808"]}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2018YFA0704000"],"award-info":[{"award-number":["2018YFA0704000"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,8,7]]},"DOI":"10.1145\/3528233.3530745","type":"proceedings-article","created":{"date-parts":[[2022,7,20]],"date-time":"2022-07-20T13:56:43Z","timestamp":1658325403000},"page":"1-10","source":"Crossref","is-referenced-by-count":134,"title":["EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model"],"prefix":"10.1145","author":[{"given":"Xinya","family":"Ji","sequence":"first","affiliation":[{"name":"Nanjing University, China"}]},{"given":"Hang","family":"Zhou","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, China"}]},{"given":"Kaisiyuan","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Sydney, Australia"}]},{"given":"Qianyi","family":"Wu","sequence":"additional","affiliation":[{"name":"Monash University, Australia"}]},{"given":"Wayne","family":"Wu","sequence":"additional","affiliation":[{"name":"SenseTime Research, China"}]},{"given":"Feng","family":"Xu","sequence":"additional","affiliation":[{"name":"BNRist and school of software, Tsinghua University, China"}]},{"given":"Xun","family":"Cao","sequence":"additional","affiliation":[{"name":"Nanjing University, China"}]}],"member":"320","published-online":{"date-parts":[[2022,8,7]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379337.3415877"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2503385.2503473"},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision. 609\u2013617","author":"Arandjelovic Relja","year":"2017","unstructured":"Relja Arandjelovic and Andrew Zisserman . 2017 . Look, listen and learn . In Proceedings of the IEEE International Conference on Computer Vision. 609\u2013617 . Relja Arandjelovic and Andrew Zisserman. 2017. Look, listen and learn. In Proceedings of the IEEE International Conference on Computer Vision. 609\u2013617."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/311535.311537"},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of the 24th annual conference on Computer graphics and interactive techniques. 353\u2013360","author":"Bregler Christoph","year":"1997","unstructured":"Christoph Bregler , Michele Covell , and Malcolm Slaney . 1997 a. Video rewrite: Driving visual speech with audio . In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. 353\u2013360 . Christoph Bregler, Michele Covell, and Malcolm Slaney. 1997a. Video rewrite: Driving visual speech with audio. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. 353\u2013360."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/258734.258880"},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13786\u201313795","author":"Burkov Egor","year":"2020","unstructured":"Egor Burkov , Igor Pasechnik , Artur Grigorev , and Victor Lempitsky . 2020 . Neural head reenactment with latent pose descriptors . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13786\u201313795 . Egor Burkov, Igor Pasechnik, Artur Grigorev, and Victor Lempitsky. 2020. Neural head reenactment with latent pose descriptors. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13786\u201313795."},{"key":"e_1_3_2_1_8_1","volume-title":"Crema-d: Crowd-sourced emotional multimodal actors dataset","author":"Cao Houwei","year":"2014","unstructured":"Houwei Cao , David\u00a0 G Cooper , Michael\u00a0 K Keutmann , Ruben\u00a0 C Gur , Ani Nenkova , and Ragini Verma . 2014 . Crema-d: Crowd-sourced emotional multimodal actors dataset . IEEE transactions on affective computing 5, 4 (2014), 377\u2013390. Houwei Cao, David\u00a0G Cooper, Michael\u00a0K Keutmann, Ruben\u00a0C Gur, Ani Nenkova, and Ragini Verma. 2014. Crema-d: Crowd-sourced emotional multimodal actors dataset. IEEE transactions on affective computing 5, 4 (2014), 377\u2013390."},{"key":"e_1_3_2_1_9_1","volume-title":"European Conference on Computer Vision. Springer, 103\u2013120","author":"Chai Lucy","year":"2020","unstructured":"Lucy Chai , David Bau , Ser-Nam Lim , and Phillip Isola . 2020 . What makes fake images detectable? understanding properties that generalize . In European Conference on Computer Vision. Springer, 103\u2013120 . Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. 2020. What makes fake images detectable? understanding properties that generalize. In European Conference on Computer Vision. Springer, 103\u2013120."},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7832\u20137841","author":"Chen Lele","year":"2019","unstructured":"Lele Chen , Ross\u00a0 K Maddox , Zhiyao Duan , and Chenliang Xu . 2019 a. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7832\u20137841 . Lele Chen, Ross\u00a0K Maddox, Zhiyao Duan, and Chenliang Xu. 2019a. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7832\u20137841."},{"key":"e_1_3_2_1_11_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7832\u20137841","author":"Chen Lele","year":"2019","unstructured":"Lele Chen , Ross\u00a0 K Maddox , Zhiyao Duan , and Chenliang Xu . 2019 b. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7832\u20137841 . Lele Chen, Ross\u00a0K Maddox, Zhiyao Duan, and Chenliang Xu. 2019b. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7832\u20137841."},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13518\u201313527","author":"Chen Zhuo","year":"2020","unstructured":"Zhuo Chen , Chaoyue Wang , Bo Yuan , and Dacheng Tao . 2020 . Puppeteergan: Arbitrary portrait animation with semantic-aware appearance transformation . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13518\u201313527 . Zhuo Chen, Chaoyue Wang, Bo Yuan, and Dacheng Tao. 2020. Puppeteergan: Arbitrary portrait animation with semantic-aware appearance transformation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 13518\u201313527."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Joon\u00a0Son Chung Amir Jamaludin and Andrew Zisserman. 2017. You said that?arXiv preprint arXiv:1705.02966(2017).  Joon\u00a0Son Chung Amir Jamaludin and Andrew Zisserman. 2017. You said that?arXiv preprint arXiv:1705.02966(2017).","DOI":"10.5244\/C.31.109"},{"key":"e_1_3_2_1_14_1","volume-title":"Asian conference on computer vision. Springer, 87\u2013103","author":"Chung Joon\u00a0Son","year":"2016","unstructured":"Joon\u00a0Son Chung and Andrew Zisserman . 2016 a. Lip reading in the wild . In Asian conference on computer vision. Springer, 87\u2013103 . Joon\u00a0Son Chung and Andrew Zisserman. 2016a. Lip reading in the wild. In Asian conference on computer vision. Springer, 87\u2013103."},{"key":"e_1_3_2_1_15_1","volume-title":"Workshop on Multi-view Lip-reading, ACCV.","author":"Chung S.","unstructured":"J.\u00a0 S. Chung and A. Zisserman . 2016b. Out of time: automated lip sync in the wild . In Workshop on Multi-view Lip-reading, ACCV. J.\u00a0S. Chung and A. Zisserman. 2016b. Out of time: automated lip sync in the wild. In Workshop on Multi-view Lip-reading, ACCV."},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 3703\u20133712","author":"Cole Forrester","year":"2017","unstructured":"Forrester Cole , David Belanger , Dilip Krishnan , Aaron Sarna , Inbar Mosseri , and William\u00a0 T Freeman . 2017 . Synthesizing normalized faces from facial identity features . In Proceedings of the IEEE conference on computer vision and pattern recognition. 3703\u20133712 . Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, and William\u00a0T Freeman. 2017. Synthesizing normalized faces from facial identity features. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3703\u20133712."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925984"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323028"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/AVSS.2018.8639163"},{"key":"e_1_3_2_1_20_1","volume-title":"Proceedings, Part XIX 16","author":"Guo Jianzhu","year":"2020","unstructured":"Jianzhu Guo , Xiangyu Zhu , Yang Yang , Fan Yang , Zhen Lei , and Stan\u00a0 Z Li . 2020 . Towards fast, accurate and stable 3d dense face alignment. In Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020 , Proceedings, Part XIX 16 . Springer, 152\u2013168. Jianzhu Guo, Xiangyu Zhu, Yang Yang, Fan Yang, Zhen Lei, and Stan\u00a0Z Li. 2020. Towards fast, accurate and stable 3d dense face alignment. In Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XIX 16. Springer, 152\u2013168."},{"key":"e_1_3_2_1_21_1","volume-title":"Long short-term memory. Neural computation 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation 9, 8 ( 1997 ), 1735\u20131780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735\u20131780."},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7084\u20137092","author":"Huang Po-Hsiang","year":"2020","unstructured":"Po-Hsiang Huang , Fu-En Yang , and Yu- Chiang\u00a0Frank Wang . 2020 . Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7084\u20137092 . Po-Hsiang Huang, Fu-En Yang, and Yu-Chiang\u00a0Frank Wang. 2020. Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 7084\u20137092."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.167"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.632"},{"key":"e_1_3_2_1_25_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 14080\u201314089","author":"Ji Xinya","year":"2021","unstructured":"Xinya Ji , Hang Zhou , Kaisiyuan Wang , Wayne Wu , Chen\u00a0Change Loy , Xun Cao , and Feng Xu . 2021 . Audio-driven emotional video portraits . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 14080\u201314089 . Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen\u00a0Change Loy, Xun Cao, and Feng Xu. 2021. Audio-driven emotional video portraits. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 14080\u201314089."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_43"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073658"},{"key":"e_1_3_2_1_28_1","first-page":"1","article-title":"Neural style-preserving visual dubbing","volume":"38","author":"Kim Hyeongwoo","year":"2019","unstructured":"Hyeongwoo Kim , Mohamed Elgharib , Michael Zollh\u00f6fer , Hans-Peter Seidel , Thabo Beeler , Christian Richardt , and Christian Theobalt . 2019 . Neural style-preserving visual dubbing . ACM Transactions on Graphics (TOG) 38 , 6 (2019), 1 \u2013 13 . Hyeongwoo Kim, Mohamed Elgharib, Michael Zollh\u00f6fer, Hans-Peter Seidel, Thabo Beeler, Christian Richardt, and Christian Theobalt. 2019. Neural style-preserving visual dubbing. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1\u201313.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_3_2_1_29_1","first-page":"1","article-title":"Deep video portraits","volume":"37","author":"Kim Hyeongwoo","year":"2018","unstructured":"Hyeongwoo Kim , Pablo Garrido , Ayush Tewari , Weipeng Xu , Justus Thies , Matthias Niessner , Patrick P\u00e9rez , Christian Richardt , Michael Zollh\u00f6fer , and Christian Theobalt . 2018 . Deep video portraits . ACM Transactions on Graphics (TOG) 37 , 4 (2018), 1 \u2013 14 . Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick P\u00e9rez, Christian Richardt, Michael Zollh\u00f6fer, and Christian Theobalt. 2018. Deep video portraits. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1\u201314.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_3_2_1_30_1","volume-title":"Cooperative learning of audio and video models from self-supervised synchronization. Advances in Neural Information Processing Systems 31","author":"Korbar Bruno","year":"2018","unstructured":"Bruno Korbar , Du Tran , and Lorenzo Torresani . 2018. Cooperative learning of audio and video models from self-supervised synchronization. Advances in Neural Information Processing Systems 31 ( 2018 ). Bruno Korbar, Du Tran, and Lorenzo Torresani. 2018. Cooperative learning of audio and video models from self-supervised synchronization. Advances in Neural Information Processing Systems 31 (2018)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00505"},{"key":"e_1_3_2_1_32_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a035","author":"Li Lincheng","year":"2021","unstructured":"Lincheng Li , Suzhen Wang , Zhimeng Zhang , Yu Ding , Yixing Zheng , Xin Yu , and Changjie Fan . 2021 . Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation . In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a035 . 1911\u20131920. Lincheng Li, Suzhen Wang, Zhimeng Zhang, Yu Ding, Yixing Zheng, Xin Yu, and Changjie Fan. 2021. Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a035. 1911\u20131920."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0196391"},{"key":"e_1_3_2_1_34_1","volume-title":"In International Symposium on Music Information Retrieval. Citeseer.","author":"Logan Beth","year":"2000","unstructured":"Beth Logan . 2000 . Mel frequency cepstral coefficients for music modeling . In In International Symposium on Music Information Retrieval. Citeseer. Beth Logan. 2000. Mel frequency cepstral coefficients for music modeling. In In International Symposium on Music Information Retrieval. Citeseer."},{"key":"e_1_3_2_1_35_1","article-title":"Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation","volume":"40","author":"Lu Yuanxun","year":"2021","unstructured":"Yuanxun Lu , Jinxiang Chai , and Xun Cao . 2021 . Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation . ACM Transactions on Graphics 40 , 6 (2021), 17\u00a0pages. https:\/\/doi.org\/10.1145\/3478513.3480484 Yuanxun Lu, Jinxiang Chai, and Xun Cao. 2021. Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation. ACM Transactions on Graphics 40, 6 (2021), 17\u00a0pages. https:\/\/doi.org\/10.1145\/3478513.3480484","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_2_1_36_1","volume-title":"The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47, 4","author":"Ma S","year":"2015","unstructured":"Debbie\u00a0 S Ma , Joshua Correll , and Bernd Wittenbrink . 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47, 4 ( 2015 ), 1122\u20131135. Debbie\u00a0S Ma, Joshua Correll, and Bernd Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47, 4 (2015), 1122\u20131135."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8803603"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_24"},{"key":"e_1_3_2_1_39_1","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 3290\u20133298","author":"Mittal Gaurav","year":"2020","unstructured":"Gaurav Mittal and Baoyuan Wang . 2020 . Animating face using disentangled audio representations . In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 3290\u20133298 . Gaurav Mittal and Baoyuan Wang. 2020. Animating face using disentangled audio representations. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 3290\u20133298."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_48"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413532"},{"key":"e_1_3_2_1_42_1","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 41\u201350","author":"Richard Alexander","year":"2021","unstructured":"Alexander Richard , Colin Lea , Shugao Ma , Jurgen Gall , Fernando De\u00a0la Torre , and Yaser Sheikh . 2021 . Audio-and gaze-driven facial animation of codec avatars . In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 41\u201350 . Alexander Richard, Colin Lea, Shugao Ma, Jurgen Gall, Fernando De\u00a0la Torre, and Yaser Sheikh. 2021. Audio-and gaze-driven facial animation of codec avatars. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 41\u201350."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00009"},{"key":"e_1_3_2_1_44_1","volume-title":"Speech-driven expressive talking lips with conditional sequential generative adversarial networks","author":"Sadoughi Najmeh","year":"2019","unstructured":"Najmeh Sadoughi and Carlos Busso . 2019. Speech-driven expressive talking lips with conditional sequential generative adversarial networks . IEEE Transactions on Affective Computing( 2019 ). Najmeh Sadoughi and Carlos Busso. 2019. Speech-driven expressive talking lips with conditional sequential generative adversarial networks. IEEE Transactions on Affective Computing(2019)."},{"key":"e_1_3_2_1_45_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2377\u20132386","author":"Siarohin Aliaksandr","year":"2019","unstructured":"Aliaksandr Siarohin , St\u00e9phane Lathuili\u00e8re , Sergey Tulyakov , Elisa Ricci , and Nicu Sebe . 2019 a. Animating arbitrary objects via deep motion transfer . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2377\u20132386 . Aliaksandr Siarohin, St\u00e9phane Lathuili\u00e8re, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019a. Animating arbitrary objects via deep motion transfer. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2377\u20132386."},{"key":"e_1_3_2_1_46_1","first-page":"7137","article-title":"First order motion model for image animation","volume":"32","author":"Siarohin Aliaksandr","year":"2019","unstructured":"Aliaksandr Siarohin , St\u00e9phane Lathuili\u00e8re , Sergey Tulyakov , Elisa Ricci , and Nicu Sebe . 2019 b. First order motion model for image animation . Advances in Neural Information Processing Systems 32 (2019), 7137 \u2013 7147 . Aliaksandr Siarohin, St\u00e9phane Lathuili\u00e8re, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019b. First order motion model for image animation. Advances in Neural Information Processing Systems 32 (2019), 7137\u20137147.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073640"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58517-4_42"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2929464.2929475"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00137"},{"key":"e_1_3_2_1_51_1","unstructured":"Konstantinos Vougioukas Stavros Petridis and Maja Pantic. 2018. End-to-end speech-driven facial animation with temporal gans. arXiv preprint arXiv:1805.09313(2018).  Konstantinos Vougioukas Stavros Petridis and Maja Pantic. 2018. End-to-end speech-driven facial animation with temporal gans. arXiv preprint arXiv:1805.09313(2018)."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01251-8"},{"key":"e_1_3_2_1_53_1","volume-title":"European Conference on Computer Vision. Springer, 700\u2013717","author":"Wang Kaisiyuan","year":"2020","unstructured":"Kaisiyuan Wang , Qianyi Wu , Linsen Song , Zhuoqian Yang , Wayne Wu , Chen Qian , Ran He , Yu Qiao , and Chen\u00a0Change Loy . 2020 b. Mead: A large-scale audio-visual dataset for emotional talking-face generation . In European Conference on Computer Vision. Springer, 700\u2013717 . Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wayne Wu, Chen Qian, Ran He, Yu Qiao, and Chen\u00a0Change Loy. 2020b. Mead: A large-scale audio-visual dataset for emotional talking-face generation. In European Conference on Computer Vision. Springer, 700\u2013717."},{"key":"e_1_3_2_1_54_1","volume-title":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4529\u20134532","author":"Wang Lijuan","year":"2012","unstructured":"Lijuan Wang , Wei Han , and Frank\u00a0 K Soong . 2012 . High quality lip-sync animation for 3D photo-realistic talking head . In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4529\u20134532 . Lijuan Wang, Wei Han, and Frank\u00a0K Soong. 2012. High quality lip-sync animation for 3D photo-realistic talking head. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4529\u20134532."},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Suzhen Wang Lincheng Li Yu Ding Changjie Fan and Xin Yu. 2021a. Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion. arXiv preprint arXiv:2107.09293(2021).  Suzhen Wang Lincheng Li Yu Ding Changjie Fan and Xin Yu. 2021a. Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion. arXiv preprint arXiv:2107.09293(2021).","DOI":"10.24963\/ijcai.2021\/152"},{"key":"e_1_3_2_1_56_1","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 8695\u20138704","author":"Wang Sheng-Yu","year":"2020","unstructured":"Sheng-Yu Wang , Oliver Wang , Richard Zhang , Andrew Owens , and Alexei\u00a0 A Efros . 2020 a. Cnn-generated images are surprisingly easy to spot... for now . In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 8695\u20138704 . Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei\u00a0A Efros. 2020a. Cnn-generated images are surprisingly easy to spot... for now. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 8695\u20138704."},{"key":"e_1_3_2_1_57_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10039\u201310049","author":"Wang Ting-Chun","year":"2021","unstructured":"Ting-Chun Wang , Arun Mallya , and Ming-Yu Liu . 2021 b. One-shot free-view neural talking-head synthesis for video conferencing . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10039\u201310049 . Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu. 2021b. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 10039\u201310049."},{"key":"e_1_3_2_1_58_1","volume-title":"Image quality assessment: from error visibility to structural similarity","author":"Wang Zhou","year":"2004","unstructured":"Zhou Wang , Alan\u00a0 C Bovik , Hamid\u00a0 R Sheikh , and Eero\u00a0 P Simoncelli . 2004. Image quality assessment: from error visibility to structural similarity . IEEE transactions on image processing 13, 4 ( 2004 ), 600\u2013612. Zhou Wang, Alan\u00a0C Bovik, Hamid\u00a0R Sheikh, and Eero\u00a0P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600\u2013612."},{"key":"e_1_3_2_1_59_1","volume-title":"Proceedings of the European conference on computer vision (ECCV). 603\u2013619","author":"Wu Wayne","year":"2018","unstructured":"Wayne Wu , Yunxuan Zhang , Cheng Li , Chen Qian , and Chen\u00a0Change Loy . 2018 . Reenactgan: Learning to reenact faces via boundary transfer . In Proceedings of the European conference on computer vision (ECCV). 603\u2013619 . Wayne Wu, Yunxuan Zhang, Cheng Li, Chen Qian, and Chen\u00a0Change Loy. 2018. Reenactgan: Learning to reenact faces via boundary transfer. In Proceedings of the European conference on computer vision (ECCV). 603\u2013619."},{"key":"e_1_3_2_1_60_1","volume-title":"Proceedings of the 28th ACM International Conference on Multimedia. 1773\u20131781","author":"Yao Guangming","year":"2020","unstructured":"Guangming Yao , Yi Yuan , Tianjia Shao , and Kun Zhou . 2020 . Mesh guided one-shot face reenactment using graph convolutional networks . In Proceedings of the 28th ACM International Conference on Multimedia. 1773\u20131781 . Guangming Yao, Yi Yuan, Tianjia Shao, and Kun Zhou. 2020. Mesh guided one-shot face reenactment using graph convolutional networks. In Proceedings of the 28th ACM International Conference on Multimedia. 1773\u20131781."},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3449063"},{"key":"e_1_3_2_1_62_1","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision. 7556\u20137566","author":"Yu Ning","year":"2019","unstructured":"Ning Yu , Larry\u00a0 S Davis , and Mario Fritz . 2019 . Attributing fake images to gans: Learning and analyzing gan fingerprints . In Proceedings of the IEEE\/CVF international conference on computer vision. 7556\u20137566 . Ning Yu, Larry\u00a0S Davis, and Mario Fritz. 2019. Attributing fake images to gans: Learning and analyzing gan fingerprints. In Proceedings of the IEEE\/CVF international conference on computer vision. 7556\u20137566."},{"key":"e_1_3_2_1_63_1","volume-title":"European Conference on Computer Vision. Springer, 524\u2013540","author":"Zakharov Egor","year":"2020","unstructured":"Egor Zakharov , Aleksei Ivakhnenko , Aliaksandra Shysheya , and Victor Lempitsky . 2020 . Fast bi-layer neural synthesis of one-shot realistic head avatars . In European Conference on Computer Vision. Springer, 524\u2013540 . Egor Zakharov, Aleksei Ivakhnenko, Aliaksandra Shysheya, and Victor Lempitsky. 2020. Fast bi-layer neural synthesis of one-shot realistic head avatars. In European Conference on Computer Vision. Springer, 524\u2013540."},{"key":"e_1_3_2_1_64_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5326\u20135335","author":"Zhang Jiangning","year":"2020","unstructured":"Jiangning Zhang , Xianfang Zeng , Mengmeng Wang , Yusu Pan , Liang Liu , Yong Liu , Yu Ding , and Changjie Fan . 2020 . Freenet: Multi-identity face reenactment . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5326\u20135335 . Jiangning Zhang, Xianfang Zeng, Mengmeng Wang, Yusu Pan, Liang Liu, Yong Liu, Yu Ding, and Changjie Fan. 2020. Freenet: Multi-identity face reenactment. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 5326\u20135335."},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019299"},{"key":"e_1_3_2_1_66_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4176\u20134186","author":"Zhou Hang","year":"2021","unstructured":"Hang Zhou , Yasheng Sun , Wayne Wu , Chen\u00a0Change Loy , Xiaogang Wang , and Ziwei Liu . 2021 . Pose-controllable talking face generation by implicitly modularized audio-visual representation . In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4176\u20134186 . Hang Zhou, Yasheng Sun, Wayne Wu, Chen\u00a0Change Loy, Xiaogang Wang, and Ziwei Liu. 2021. Pose-controllable talking face generation by implicitly modularized audio-visual representation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4176\u20134186."},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3414685.3417774","article-title":"MakeltTalk: speaker-aware talking-head animation","volume":"39","author":"Zhou Yang","year":"2020","unstructured":"Yang Zhou , Xintong Han , Eli Shechtman , Jose Echevarria , Evangelos Kalogerakis , and Dingzeyu Li . 2020 . MakeltTalk: speaker-aware talking-head animation . ACM Transactions on Graphics (TOG) 39 , 6 (2020), 1 \u2013 15 . Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria, Evangelos Kalogerakis, and Dingzeyu Li. 2020. MakeltTalk: speaker-aware talking-head animation. ACM Transactions on Graphics (TOG) 39, 6 (2020), 1\u201315.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_3_2_1_68_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 14800\u201314809","author":"Zhou Yipin","year":"2021","unstructured":"Yipin Zhou and Ser-Nam Lim . 2021 . Joint Audio-Visual Deepfake Detection . In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 14800\u201314809 . Yipin Zhou and Ser-Nam Lim. 2021. Joint Audio-Visual Deepfake Detection. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 14800\u201314809."},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3197517.3201292","article-title":"Visemenet: Audio-driven animator-centric speech animation","volume":"37","author":"Zhou Yang","year":"2018","unstructured":"Yang Zhou , Zhan Xu , Chris Landreth , Evangelos Kalogerakis , Subhransu Maji , and Karan Singh . 2018 . Visemenet: Audio-driven animator-centric speech animation . ACM Transactions on Graphics (TOG) 37 , 4 (2018), 1 \u2013 10 . Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, and Karan Singh. 2018. Visemenet: Audio-driven animator-centric speech animation. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1\u201310.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_3_2_1_70_1","volume-title":"Computer Graphics Forum, Vol.\u00a037","author":"Zollh\u00f6fer Michael","unstructured":"Michael Zollh\u00f6fer , Justus Thies , Pablo Garrido , Derek Bradley , Thabo Beeler , Patrick P\u00e9rez , Marc Stamminger , Matthias Nie\u00dfner , and Christian Theobalt . 2018. State of the art on monocular 3D face reconstruction, tracking, and applications . In Computer Graphics Forum, Vol.\u00a037 . Wiley Online Library , 523\u2013550. Michael Zollh\u00f6fer, Justus Thies, Pablo Garrido, Derek Bradley, Thabo Beeler, Patrick P\u00e9rez, Marc Stamminger, Matthias Nie\u00dfner, and Christian Theobalt. 2018. State of the art on monocular 3D face reconstruction, tracking, and applications. In Computer Graphics Forum, Vol.\u00a037. Wiley Online Library, 523\u2013550."}],"event":{"name":"SIGGRAPH '22: Special Interest Group on Computer Graphics and Interactive Techniques Conference","location":"Vancouver BC Canada","acronym":"SIGGRAPH '22","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3528233.3530745","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:42Z","timestamp":1750186962000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3528233.3530745"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,7]]},"references-count":70,"alternative-id":["10.1145\/3528233.3530745","10.1145\/3528233"],"URL":"https:\/\/doi.org\/10.1145\/3528233.3530745","relation":{},"subject":[],"published":{"date-parts":[[2022,8,7]]}}}