{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T20:20:33Z","timestamp":1770754833912,"version":"3.50.0"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,12,19]],"date-time":"2021-12-19T00:00:00Z","timestamp":1639872000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,12,19]]},"DOI":"10.1145\/3490035.3490305","type":"proceedings-article","created":{"date-parts":[[2021,12,14]],"date-time":"2021-12-14T23:15:16Z","timestamp":1639523716000},"page":"1-9","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Realistic talking face animation with speech-induced head motion"],"prefix":"10.1145","author":[{"given":"Sandika","family":"Biswas","sequence":"first","affiliation":[{"name":"TCS Research, India"}]},{"given":"Sanjana","family":"Sinha","sequence":"additional","affiliation":[{"name":"TCS Research, India"}]},{"given":"Dipanjan","family":"Das","sequence":"additional","affiliation":[{"name":"TCS Research, India"}]},{"given":"Brojeshwar","family":"Bhowmick","sequence":"additional","affiliation":[{"name":"TCS Research, India"}]}],"member":"320","published-online":{"date-parts":[[2021,12,19]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Joon Son Chung, and Andrew Zisserman","author":"Afouras Triantafyllos","year":"2018","unstructured":"Triantafyllos Afouras , Joon Son Chung, and Andrew Zisserman . 2018 . LRS 3-TED: a large-scale dataset for visual speech recognition. arXiv preprint arXiv:1809.00496 (2018). Triantafyllos Afouras, Joon Son Chung, and Andrew Zisserman. 2018. LRS3-TED: a large-scale dataset for visual speech recognition. arXiv preprint arXiv:1809.00496 (2018)."},{"key":"e_1_3_2_1_2_1","volume-title":"Wasserstein gan. arXiv preprint arXiv:1701.07875","author":"Arjovsky Martin","year":"2017","unstructured":"Martin Arjovsky , Soumith Chintala , and L\u00e9on Bottou . 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 ( 2017 ). Martin Arjovsky, Soumith Chintala, and L\u00e9on Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2018.00019"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.885910"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1089870.1089884"},{"key":"e_1_3_2_1_6_1","volume-title":"Talking-head Generation with Rhythmic Head Motion. In European Conference on Computer Vision.","author":"Chen Lele","year":"2020","unstructured":"Lele Chen , Guofeng Cui , Celong Liu , Zhong Li , Ziyi Kou , Yi Xu , and Chenliang Xu . 2020 . Talking-head Generation with Rhythmic Head Motion. In European Conference on Computer Vision. Lele Chen, Guofeng Cui, Celong Liu, Zhong Li, Ziyi Kou, Yi Xu, and Chenliang Xu. 2020. Talking-head Generation with Rhythmic Head Motion. In European Conference on Computer Vision."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_32"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00802"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126686.3126723"},{"key":"e_1_3_2_1_10_1","volume-title":"You said that? arXiv preprint arXiv:1705.02966","author":"Chung Joon Son","year":"2017","unstructured":"Joon Son Chung , Amir Jamaludin , and Andrew Zisserman . 2017. You said that? arXiv preprint arXiv:1705.02966 ( 2017 ). Joon Son Chung, Amir Jamaludin, and Andrew Zisserman. 2017. You said that? arXiv preprint arXiv:1705.02966 (2017)."},{"key":"e_1_3_2_1_11_1","volume-title":"Voxceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622","author":"Chung Joon Son","year":"2018","unstructured":"Joon Son Chung , Arsha Nagrani , and Andrew Zisserman . 2018. Voxceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622 ( 2018 ). Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. 2018. Voxceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)."},{"key":"e_1_3_2_1_12_1","volume-title":"Workshop on Multi-view Lip-reading, ACCV.","author":"Chung J. S.","unstructured":"J. S. Chung and A. Zisserman . 2016. Out of time: automated lip sync in the wild . In Workshop on Multi-view Lip-reading, ACCV. J. S. Chung and A. Zisserman. 2016. Out of time: automated lip sync in the wild. In Workshop on Multi-view Lip-reading, ACCV."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58577-8_25"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00482"},{"key":"e_1_3_2_1_15_1","volume-title":"Simoncelli","author":"Ding Keyan","year":"2020","unstructured":"Keyan Ding , Kede Ma , Shiqi Wang , and Eero P . Simoncelli . 2020 . Image Quality Assessment: Unifying Structure and Texture Similarity. CoRR abs\/2004.07728 (2020). https:\/\/arxiv.org\/abs\/2004.07728 Keyan Ding, Kede Ma, Shiqi Wang, and Eero P. Simoncelli. 2020. Image Quality Assessment: Unifying Structure and Texture Similarity. CoRR abs\/2004.07728 (2020). https:\/\/arxiv.org\/abs\/2004.07728"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305381.3305498"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/874061.875416"},{"key":"e_1_3_2_1_18_1","unstructured":"Awni Hannun Carl Case Jared Casper Bryan Catanzaro Greg Diamos Erich Elsen Ryan Prenger Sanjeev Satheesh Shubho Sengupta Adam Coates etal 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014).  Awni Hannun Carl Case Jared Casper Bryan Catanzaro Greg Diamos Erich Elsen Ryan Prenger Sanjeev Satheesh Shubho Sengupta Adam Coates et al. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567 (2014)."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2407694"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295408"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.167"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_43"},{"key":"e_1_3_2_1_23_1","volume-title":"Sixth European Conference on Speech Communication and Technology.","author":"Kuratate Takaaki","year":"1999","unstructured":"Takaaki Kuratate , Kevin G Munhall , Philip E Rubin , Eric Vatikiotis-Bateson , and Hani Yehia . 1999 . Audio-visual synthesis of talking faces from speech production correlates . In Sixth European Conference on Speech Communication and Technology. Takaaki Kuratate, Kevin G Munhall, Philip E Rubin, Eric Vatikiotis-Bateson, and Hani Yehia. 1999. Audio-visual synthesis of talking faces from speech production correlates. In Sixth European Conference on Speech Communication and Technology."},{"key":"e_1_3_2_1_24_1","volume-title":"Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio.","author":"Lin Zhouhan","year":"2017","unstructured":"Zhouhan Lin , Minwei Feng , Cicero Nogueira dos Santos , Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017 . A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017). Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130 (2017)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.1188976"},{"key":"e_1_3_2_1_26_1","volume-title":"Prediction of head motion from speech waveforms with a canonical-correlation-constrained autoencoder. arXiv preprint arXiv:2002.01869","author":"Lu JinHong","year":"2020","unstructured":"JinHong Lu and Hiroshi Shimodaira . 2020. Prediction of head motion from speech waveforms with a canonical-correlation-constrained autoencoder. arXiv preprint arXiv:2002.01869 ( 2020 ). JinHong Lu and Hiroshi Shimodaira. 2020. Prediction of head motion from speech waveforms with a canonical-correlation-constrained autoencoder. arXiv preprint arXiv:2002.01869 (2020)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.0963-7214.2004.01502010.x"},{"key":"e_1_3_2_1_28_1","volume-title":"Joon Son Chung, and Andrew Zisserman","author":"Nagrani Arsha","year":"2017","unstructured":"Arsha Nagrani , Joon Son Chung, and Andrew Zisserman . 2017 . Voxceleb : a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017). Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/QOMEX.2009.5246972"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Omkar M Parkhi Andrea Vedaldi Andrew Zisserman etal 2015. Deep face recognition. In bmvc Vol. 1. 6.  Omkar M Parkhi Andrea Vedaldi Andrew Zisserman et al. 2015. Deep face recognition. In bmvc Vol. 1. 6.","DOI":"10.5244\/C.29.41"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413532"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Sanjana Sinha Sandika Biswas and Brojeshwar Bhowmick. 2020. Identity-Preserving Realistic Talking Face Generation. In arXiv. arXiv-2005.  Sanjana Sinha Sandika Biswas and Brojeshwar Bhowmick. 2020. Identity-Preserving Realistic Talking Face Generation. In arXiv. arXiv-2005.","DOI":"10.1109\/IJCNN48605.2020.9206665"},{"key":"e_1_3_2_1_34_1","volume-title":"Everybody's Talkin': Let Me Talk as You Want. arXiv preprint arXiv:2001.05201","author":"Song Linsen","year":"2020","unstructured":"Linsen Song , Wayne Wu , Chen Qian , Ran He , and Chen Change Loy . 2020. Everybody's Talkin': Let Me Talk as You Want. arXiv preprint arXiv:2001.05201 ( 2020 ). Linsen Song, Wayne Wu, Chen Qian, Ran He, and Chen Change Loy. 2020. Everybody's Talkin': Let Me Talk as You Want. arXiv preprint arXiv:2001.05201 (2020)."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/3367032.3367163"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2005.86"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073640"},{"key":"e_1_3_2_1_38_1","volume-title":"Neural voice puppetry: Audio-driven facial reenactment. arXiv preprint arXiv:1912.05566","author":"Thies Justus","year":"2019","unstructured":"Justus Thies , Mohamed Elgharib , Ayush Tewari , Christian Theobalt , and Matthias Nie\u00dfner . 2019. Neural voice puppetry: Audio-driven facial reenactment. arXiv preprint arXiv:1912.05566 ( 2019 ). Justus Thies, Mohamed Elgharib, Ayush Tewari, Christian Theobalt, and Matthias Nie\u00dfner. 2019. Neural voice puppetry: Audio-driven facial reenactment. arXiv preprint arXiv:1912.05566 (2019)."},{"key":"e_1_3_2_1_39_1","article-title":"Visualizing data using t-SNE","volume":"9","author":"der Maaten Laurens Van","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton . 2008 . Visualizing data using t-SNE . Journal of machine learning research 9 , 11 (2008). Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).","journal-title":"Journal of machine learning research"},{"key":"e_1_3_2_1_40_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 37--40","author":"Vougioukas Konstantinos","year":"2019","unstructured":"Konstantinos Vougioukas , Samsung AI Center , Stavros Petridis , and Maja Pantic . 2019 . End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 37--40 . Konstantinos Vougioukas, Samsung AI Center, Stavros Petridis, and Maja Pantic. 2019. End-to-End Speech-Driven Realistic Facial Animation with Temporal GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 37--40."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01251-8"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3454738"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_2_1_44_1","volume-title":"Audio-driven Talking Face Video Generation with Natural Head Pose. arXiv preprint arXiv:2002.10137","author":"Yi Ran","year":"2020","unstructured":"Ran Yi , Zipeng Ye , Juyong Zhang , Hujun Bao , and Yong-Jin Liu . 2020. Audio-driven Talking Face Video Generation with Natural Head Pose. arXiv preprint arXiv:2002.10137 ( 2020 ). Ran Yi, Zipeng Ye, Juyong Zhang, Hujun Bao, and Yong-Jin Liu. 2020. Audio-driven Talking Face Video Generation with Natural Head Pose. arXiv preprint arXiv:2002.10137 (2020)."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00955"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00366"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019299"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00416"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3414685.3417774"}],"event":{"name":"ICVGIP '21: Indian Conference on Computer Vision, Graphics and Image Processing","location":"Jodhpur India","acronym":"ICVGIP '21"},"container-title":["Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3490035.3490305","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3490035.3490305","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:23Z","timestamp":1750188683000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3490035.3490305"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,19]]},"references-count":49,"alternative-id":["10.1145\/3490035.3490305","10.1145\/3490035"],"URL":"https:\/\/doi.org\/10.1145\/3490035.3490305","relation":{},"subject":[],"published":{"date-parts":[[2021,12,19]]},"assertion":[{"value":"2021-12-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}