{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T04:00:32Z","timestamp":1774929632599,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,3,17]],"date-time":"2023-03-17T00:00:00Z","timestamp":1679011200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"State Key Laboratory of Media Convergence Production Technology and Systems","award":["SKLMCPTS2020012"],"award-info":[{"award-number":["SKLMCPTS2020012"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,3,17]]},"DOI":"10.1145\/3590003.3590004","type":"proceedings-article","created":{"date-parts":[[2023,5,29]],"date-time":"2023-05-29T18:22:56Z","timestamp":1685384576000},"page":"1-5","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Virtual Human Talking-Head Generation"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4024-2680","authenticated-orcid":false,"given":"Wenchao","family":"Song","sequence":"first","affiliation":[{"name":"State Key Laboratory of Media Convergence and Communication,, Communication University of China, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4818-741X","authenticated-orcid":false,"given":"Qiang","family":"He","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Media Convergence and Communication, Communication University of China, China and \rState Key Laboratory of Media Convergence Production Technology and Systems, Xinhua News Agency, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5812-6007","authenticated-orcid":false,"given":"Guowei","family":"Chen","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Media Convergence and Communication,, Communication University of China, China"}]}],"member":"320","published-online":{"date-parts":[[2023,5,29]]},"reference":[{"issue":"2","key":"e_1_3_2_1_1_1","first-page":"89","article-title":"A review of virtual human synthesis","volume":"17","author":"Wang Zhaoqi","year":"2000","unstructured":"Wang Zhaoqi , \" A review of virtual human synthesis \", Journal of Chinese Academy of Sciences , vol. 17 , no. 2 , pp. 89 , 2000 . Wang Zhaoqi, \"A review of virtual human synthesis\", Journal of Chinese Academy of Sciences, vol. 17, no. 2, pp. 89, 2000.","journal-title":"Journal of Chinese Academy of Sciences"},{"key":"e_1_3_2_1_2_1","first-page":"5","volume-title":"Research on virtual human technology China water transportation","author":"Chen Qixiang","year":"2006","unstructured":"Chen Qixiang and Wei Kejun , Research on virtual human technology China water transportation , Academic , pp. 5 , 2006 . Chen Qixiang and Wei Kejun, Research on virtual human technology China water transportation, Academic, pp. 5, 2006."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Thies J Zollhofer M Stamminger M Face2face: Real-time face capture and reenactment of rgb videos[C]\/\/Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2387-2395.  Thies J Zollhofer M Stamminger M Face2face: Real-time face capture and reenactment of rgb videos[C]\/\/Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2387-2395.","DOI":"10.1109\/CVPR.2016.262"},{"key":"e_1_3_2_1_4_1","first-page":"263","volume-title":"Asian conference on computer vision (ACCV)","author":"Chung A.","year":"2016","unstructured":"J. S. Chung , A. Zisserman, Out of time: automated lip sync in the wild , in: Asian conference on computer vision (ACCV) , 2016 , pp. 251\u2013 263 . J. S. Chung, A. Zisserman, Out of time: automated lip sync in the wild, in: Asian conference on computer vision (ACCV), 2016, pp. 251\u2013263."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073640"},{"key":"e_1_3_2_1_6_1","volume-title":"BMVC","author":"Chung A.","year":"2017","unstructured":"J. S. Chung , A. Jamaludin , and A. Zisserman , \u201c You said that? \u201d in BMVC , 2017 . J. S. Chung, A. Jamaludin, and A. Zisserman, \u201cYou said that?\u201d in BMVC, 2017."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073658"},{"key":"e_1_3_2_1_8_1","volume-title":"Kumar K","author":"Kumar R","year":"1801","unstructured":"Kumar R , Sotelo J , Kumar K , Obamanet : Photo-realistic lip-sync from text[J]. arXiv preprint arXiv: 1801 .01442, 2017. Kumar R, Sotelo J, Kumar K, Obamanet: Photo-realistic lip-sync from text[J]. arXiv preprint arXiv:1801.01442, 2017."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Chen L Li Z Maddox R K Lip movements generation at a glance[C]\/\/Proceedings of the European Conference on Computer Vision (ECCV). 2018: 520-535.  Chen L Li Z Maddox R K Lip movements generation at a glance[C]\/\/Proceedings of the European Conference on Computer Vision (ECCV). 2018: 520-535.","DOI":"10.1007\/978-3-030-01234-2_32"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201283"},{"key":"e_1_3_2_1_11_1","unstructured":"Vougioukas K Petridis S Pantic M. End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs[C]\/\/CVPR Workshops. 2019: 37-40.  Vougioukas K Petridis S Pantic M. End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs[C]\/\/CVPR Workshops. 2019: 37-40."},{"key":"e_1_3_2_1_12_1","volume-title":"Talking face generation by conditional recurrent adversarial network[J]. arXiv preprint arXiv:1804.04786","author":"Song Y","year":"2018","unstructured":"Song Y , Zhu J , Li D , Talking face generation by conditional recurrent adversarial network[J]. arXiv preprint arXiv:1804.04786 , 2018 . Song Y, Zhu J, Li D, Talking face generation by conditional recurrent adversarial network[J]. arXiv preprint arXiv:1804.04786, 2018."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019299"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00802"},{"key":"e_1_3_2_1_15_1","volume-title":"Mining audio, text and visual information for talking face generation[C]\/\/2019 IEEE International Conference on Data Mining (ICDM)","author":"Yu L","year":"2019","unstructured":"Yu L , Yu J , Ling Q. Mining audio, text and visual information for talking face generation[C]\/\/2019 IEEE International Conference on Data Mining (ICDM) . IEEE , 2019 : 787-795. Yu L, Yu J, Ling Q. Mining audio, text and visual information for talking face generation[C]\/\/2019 IEEE International Conference on Data Mining (ICDM). IEEE, 2019: 787-795."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Cudeiro D Bolkart T Laidlaw C Capture learning and synthesis of 3D speaking styles[C]\/\/Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2019: 10101-10111.  Cudeiro D Bolkart T Laidlaw C Capture learning and synthesis of 3D speaking styles[C]\/\/Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2019: 10101-10111.","DOI":"10.1109\/CVPR.2019.01034"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323028"},{"issue":"6","key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3414685.3417774","article-title":"Makelttalk: speaker-aware talking-head animation","volume":"39","author":"Zhou X.","year":"2020","unstructured":"Y. Zhou , X. Han , E. Shechtman , J. Echevarria , E. Kalogerakis , and D. Li , \u201c Makelttalk: speaker-aware talking-head animation ,\u201d ACM TOG , vol. 39 , no. 6 , pp. 1 \u2013 15 , 2020 . Y. Zhou, X. Han, E. Shechtman, J. Echevarria, E. Kalogerakis, and D. Li, \u201cMakelttalk: speaker-aware talking-head animation,\u201d ACM TOG, vol. 39, no. 6, pp. 1\u201315, 2020.","journal-title":"ACM TOG"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413532"},{"key":"e_1_3_2_1_20_1","volume-title":"Neural voice puppetry: Audio-driven facial reenactment[C]\/\/European conference on computer vision","author":"Thies J","year":"2020","unstructured":"Thies J , Elgharib M , Tewari A , Neural voice puppetry: Audio-driven facial reenactment[C]\/\/European conference on computer vision . Springer , Cham , 2020 : 716-731. Thies J, Elgharib M, Tewari A, Neural voice puppetry: Audio-driven facial reenactment[C]\/\/European conference on computer vision. Springer, Cham, 2020: 716-731."},{"key":"e_1_3_2_1_21_1","first-page":"1985","article-title":"Duallip: A system for joint lip reading and generation","author":"Chen X.","year":"2020","unstructured":"W. Chen , X. Tan , Y. Xia , T. Qin , Y. Wang , and T.-Y. Liu , \u201c Duallip: A system for joint lip reading and generation ,\u201d in ACM MM , 2020 , pp. 1985 \u2013 1993 . W. Chen, X. Tan, Y. Xia, T. Qin, Y. Wang, and T.-Y. Liu, \u201cDuallip: A system for joint lip reading and generation,\u201d in ACM MM, 2020, pp. 1985\u20131993.","journal-title":"ACM MM"},{"key":"e_1_3_2_1_22_1","volume-title":"Liang S","author":"Guo Y","year":"2021","unstructured":"Guo Y , Chen K , Liang S , Ad-nerf : Audio driven neural radiance fields for talking head synthesis[C]\/\/Proceedings of the IEEE\/CVF International Conference on Computer Vision . 2021 : 5784-5794. Guo Y, Chen K, Liang S, Ad-nerf: Audio driven neural radiance fields for talking head synthesis[C]\/\/Proceedings of the IEEE\/CVF International Conference on Computer Vision. 2021: 5784-5794."},{"key":"e_1_3_2_1_23_1","volume-title":"Zhang Z","author":"Li L","year":"2021","unstructured":"Li L , Wang S , Zhang Z , Write-a-speaker : Text-based emotional and rhythmic talking-head generation[C]\/\/Proceedings of the AAAI Conference on Artificial Intelligence . 2021 , 35(3): 1911-1920. Li L, Wang S, Zhang Z, Write-a-speaker: Text-based emotional and rhythmic talking-head generation[C]\/\/Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(3): 1911-1920."},{"key":"e_1_3_2_1_24_1","volume-title":"Faceformer: Speechdriven 3d facial animation with transformers","author":"Fan Z.","year":"2021","unstructured":"Y. Fan , Z. Lin , J. Saito , W. Wang , and T. Komura , \u201c Faceformer: Speechdriven 3d facial animation with transformers ,\u201d arXiv:2112.05329, 2021 . Y. Fan, Z. Lin, J. Saito, W. Wang, and T. Komura, \u201cFaceformer: Speechdriven 3d facial animation with transformers,\u201d arXiv:2112.05329, 2021."},{"key":"e_1_3_2_1_25_1","volume-title":"AAAI","author":"Yang W.-C.","year":"2022","unstructured":"C.-C. Yang , W.-C. Fan , C.-F. Yang , and Y.-C. F. Wang , \u201cCrossmodal mutual learning for audio-visual speech recognition and manipulation ,\u201d in AAAI , 2022 . C.-C. Yang, W.-C. Fan, C.-F. Yang, and Y.-C. F. Wang, \u201cCrossmodal mutual learning for audio-visual speech recognition and manipulation,\u201d in AAAI, 2022."},{"key":"e_1_3_2_1_26_1","first-page":"2659","volume-title":"Speech and Signal Processing (ICASSP)","author":"Zhang J.","year":"2022","unstructured":"S. Zhang , J. Yuan , M. Liao and L. Zhang , \" Text2video: Text-Driven Talking-Head Video Synthesis with Personalized Phoneme - Pose Dictionary,\" ICASSP 2022 - 2022 IEEE International Conference on Acoustics , Speech and Signal Processing (ICASSP) , 2022 , pp. 2659 - 2266 . S. Zhang, J. Yuan, M. Liao and L. Zhang, \"Text2video: Text-Driven Talking-Head Video Synthesis with Personalized Phoneme - Pose Dictionary,\" ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 2659-266."},{"key":"e_1_3_2_1_27_1","volume-title":"Video rewrite: Driving visual speech with audio[C]\/\/Proceedings of the 24th annual conference on Computer graphics and interactive techniques. 1997: 353-360","author":"Bregler C","unstructured":"Bregler C , Covell M , Slaney M. Video rewrite: Driving visual speech with audio[C]\/\/Proceedings of the 24th annual conference on Computer graphics and interactive techniques. 1997: 353-360 . Bregler C, Covell M, Slaney M. Video rewrite: Driving visual speech with audio[C]\/\/Proceedings of the 24th annual conference on Computer graphics and interactive techniques. 1997: 353-360."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3142387"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130813"},{"key":"e_1_3_2_1_30_1","first-page":"7247","article-title":"Robust and Efficient Speech-to-Animation[C]\/\/ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2022","author":"Chen L","unstructured":"Chen L , Wu Z , Ling J , Transformer -S2 A : Robust and Efficient Speech-to-Animation[C]\/\/ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE , 2022 : 7247 - 7251 . Chen L, Wu Z, Ling J, Transformer-S2A: Robust and Efficient Speech-to-Animation[C]\/\/ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 7247-7251.","journal-title":"IEEE"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Hong\n      Y Peng\n      B Xiao H Headnerf: A real-time nerf-based parametric head model[C]\/\/Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition.\n  2022\n  : \n  20374\n  -\n  20384\n  .  Hong Y Peng B Xiao H Headnerf: A real-time nerf-based parametric head model[C]\/\/Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2022: 20374-20384.","DOI":"10.1109\/CVPR52688.2022.01973"},{"key":"e_1_3_2_1_32_1","volume-title":"DONeRF: Towards Real\u2010Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks[C]\/\/Computer Graphics Forum","author":"Neff T","year":"2021","unstructured":"Neff T , Stadlbauer P , Parger M , DONeRF: Towards Real\u2010Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks[C]\/\/Computer Graphics Forum . 2021 , 40(4): 45-59. Neff T, Stadlbauer P, Parger M, DONeRF: Towards Real\u2010Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks[C]\/\/Computer Graphics Forum. 2021, 40(4): 45-59."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Yu A Li R Tancik M Plenoctrees for real-time rendering of neural radiance fields[C]\/\/Proceedings of the IEEE\/CVF International Conference on Computer Vision. 2021: 5752-5761.  Yu A Li R Tancik M Plenoctrees for real-time rendering of neural radiance fields[C]\/\/Proceedings of the IEEE\/CVF International Conference on Computer Vision. 2021: 5752-5761.","DOI":"10.1109\/ICCV48922.2021.00570"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Martin-Brualla\n      R Radwan\n      N Sajjadi M S M Nerf in the wild: Neural radiance fields for unconstrained photo collections[C]\/\/Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition.\n  2021\n  : \n  7210\n  -\n  7219\n  .  Martin-Brualla R Radwan N Sajjadi M S M Nerf in the wild: Neural radiance fields for unconstrained photo collections[C]\/\/Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 2021: 7210-7219.","DOI":"10.1109\/CVPR46437.2021.00713"},{"key":"e_1_3_2_1_35_1","volume-title":"Qiao X","author":"Huang Y","year":"2021","unstructured":"Huang Y , Zhu Y , Qiao X , Aitransfer : Progressive ai-powered transmission for real-time point cloud video streaming[C]\/\/Proceedings of the 29th ACM International Conference on Multimedia . 2021 : 3989-3997. Huang Y, Zhu Y, Qiao X, Aitransfer: Progressive ai-powered transmission for real-time point cloud video streaming[C]\/\/Proceedings of the 29th ACM International Conference on Multimedia. 2021: 3989-3997."}],"event":{"name":"CACML 2023: 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning","location":"Shanghai China","acronym":"CACML 2023"},"container-title":["Proceedings of the 2023 2nd Asia Conference on Algorithms, Computing and Machine Learning"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3590003.3590004","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3590003.3590004","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:16Z","timestamp":1750183756000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3590003.3590004"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,17]]},"references-count":35,"alternative-id":["10.1145\/3590003.3590004","10.1145\/3590003"],"URL":"https:\/\/doi.org\/10.1145\/3590003.3590004","relation":{},"subject":[],"published":{"date-parts":[[2023,3,17]]},"assertion":[{"value":"2023-05-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}