{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T23:50:28Z","timestamp":1768434628104,"version":"3.49.0"},"reference-count":75,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["62431015 and 62571317"],"award-info":[{"award-number":["62431015 and 62571317"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Fundamental Research Funds for the Central Universities, the Shanghai Key Laboratory of Digital Media Processing and Transmission","award":["22DZ2229005"],"award-info":[{"award-number":["22DZ2229005"]}]},{"DOI":"10.13039\/501100013314","name":"111 Project","doi-asserted-by":"crossref","award":["BP0719010"],"award-info":[{"award-number":["BP0719010"]}],"id":[{"id":"10.13039\/501100013314","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>With the rapid development of social media, the amount of face video data has grown rapidly, making face video compression a hot research topic. Traditional video coding techniques do not discriminate video content and compress all videos in the same way, while talking head video compression should have more potential. Existing generative compression methods mostly adopt static reference frames, resulting in a decrease in fidelity caused by dynamic background or large pose change. In this article, we propose a hybrid compression scheme for face videos which combines traditional coding with generative compression. On the one hand, we sample and encode key frames with traditional codecs to provide dynamic reference frames which contain real-time background and motion information. On the other hand, we devise a deep video generation model to synthesize smooth video frames according to the extracted sparse keypoints. Combining the pixel-level recovery capability of traditional coding with the detail generation capability of deep generative models, our proposed hybrid scheme is able to implement high-fidelity face video compression at low bitrate in real time. Additionally, we also devise a Portrait Recovery module to recover the low-quality key frames, improving the reconstruction quality in low-bitrate scenarios. Extensive experiments show that our method has advantages over traditional codecs and existing generative compression methods in terms of both rate-distortion performance and coding complexity.<\/jats:p>","DOI":"10.1145\/3783982","type":"journal-article","created":{"date-parts":[[2025,12,12]],"date-time":"2025-12-12T02:59:55Z","timestamp":1765508395000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Hybrid Scheme for Face Video Compression"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9772-3293","authenticated-orcid":false,"given":"Anni","family":"Tang","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1728-4754","authenticated-orcid":false,"given":"Zhiyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5028-1291","authenticated-orcid":false,"given":"Chen","family":"Zhu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7260-7141","authenticated-orcid":false,"given":"Jun","family":"Ling","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8261-5337","authenticated-orcid":false,"given":"Rong","family":"Xie","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7124-5182","authenticated-orcid":false,"given":"Li","family":"Song","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2026,1,14]]},"reference":[{"key":"e_1_3_3_2_2","unstructured":"Wiki. 2013. The x265 Website. Retrieved from https:\/\/bitbucket.org\/multicoreware\/x265_git\/wiki\/Home"},{"key":"e_1_3_3_3_2","unstructured":"VVC. 2020. Versatile Video Coding. Rec. ITU-T H.266 and ISO\/IEC 23090-3 (VVC). Retrieved from https:\/\/www.itu.int\/rec\/T-REC-H.266"},{"issue":"6","key":"e_1_3_3_4_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3130800.3130818","article-title":"Bringing portraits to life","volume":"36","author":"Averbuch-Elor Hadar","year":"2017","unstructured":"Hadar Averbuch-Elor, Daniel Cohen-Or, Johannes Kopf, and Michael F. Cohen. 2017. Bringing portraits to life. ACM Transactions on Graphics 36, 6 (2017), 1\u201313.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_3_5_2","first-page":"187","volume-title":"Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques","author":"Blanz Volker","year":"1999","unstructured":"Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, 187\u2013194."},{"key":"e_1_3_3_6_2","first-page":"13786","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Burkov Egor","year":"2020","unstructured":"Egor Burkov, Igor Pasechnik, Artur Grigorev, and Victor Lempitsky. 2020. Neural head reenactment with latent pose descriptors. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 13786\u201313795."},{"key":"e_1_3_3_7_2","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1109\/DCC58796.2024.00018","volume-title":"Proceedings of the 2024 Data Compression Conference (DCC)","author":"Chen Bolin","year":"2024","unstructured":"Bolin Chen, Jie Chen, Shiqi Wang, and Yan Ye. 2024. Generative face video coding techniques and standardization efforts: A review. In Proceedings of the 2024 Data Compression Conference (DCC), 103\u2013112. DOI: 10.1109\/DCC58796.2024.00018"},{"key":"e_1_3_3_8_2","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1109\/DCC52660.2022.00009","volume-title":"Proceedings of the 2022 Data Compression Conference (DCC)","author":"Chen Bolin","year":"2022","unstructured":"Bolin Chen, Zhao Wang, Binzhe Li, Rongqun Lin, Shiqi Wang, and Yan Ye. 2022. Beyond keypoint coding: Temporal evolution inference with compact feature representation for talking face video compression. In Proceedings of the 2022 Data Compression Conference (DCC), 13\u201322. DOI: 10.1109\/DCC52660.2022.00009"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3271130"},{"key":"e_1_3_3_10_2","first-page":"1","volume-title":"Proceedings of the 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP)","author":"Chen Zhehao","year":"2022","unstructured":"Zhehao Chen, Ming Lu, Hao Chen, and Zhan Ma. 2022. Robust ultralow bitrate video conferencing with second order motion coherency. In Proceedings of the 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1\u20136."},{"issue":"4","key":"e_1_3_3_11_2","first-page":"75","article-title":"Learning temporal coherence via self-supervision for GAN-based video generation","volume":"39","author":"Chu Mengyu","year":"2020","unstructured":"Mengyu Chu, You Xie, Jonas Mayer, Laura Leal-Taix\u00e9, and Nils Thuerey. 2020. Learning temporal coherence via self-supervision for GAN-based video generation. ACM Transactions on Graphics 39, 4 (2020), 75\u201371.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_3_12_2","doi-asserted-by":"crossref","unstructured":"Joon Son Chung Arsha Nagrani and Andrew Zisserman. 2018. VoxCeleb2: Deep speaker recognition. arXiv:1806.05622. Retrieved from https:\/\/arxiv.org\/abs\/1806.05622","DOI":"10.21437\/Interspeech.2018-1929"},{"issue":"5","key":"e_1_3_3_13_2","first-page":"2567","article-title":"Image quality assessment: Unifying structure and texture similarity","volume":"44","author":"Ding Keyan","year":"2020","unstructured":"Keyan Ding, Kede Ma, Shiqi Wang, and Eero P. Simoncelli. 2020. Image quality assessment: Unifying structure and texture similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 5 (2020), 2567\u20132581.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_3_14_2","unstructured":"Brian Dolhansky Joanna Bitton Ben Pflaum Jikuo Lu Russ Howes Menglin Wang and Cristian Canton Ferrer. 2020. The deepfake detection challenge (DFDC) dataset. arXiv:2006.07397. Retrieved from https:\/\/arxiv.org\/abs\/2006.07397"},{"key":"e_1_3_3_15_2","first-page":"2663","article-title":"Soft-gated warping-GAN for pose-guided person image synthesis","volume":"31","author":"Dong Haoye","year":"2018","unstructured":"Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, and Jian Yin. 2018. Soft-gated warping-GAN for pose-guided person image synthesis. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 31, 2663\u20132671.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_16_2","doi-asserted-by":"crossref","first-page":"2663","DOI":"10.1145\/3503161.3547838","volume-title":"Proceedings of the 30th ACM International Conference on Multimedia","author":"Drobyshev Nikita","year":"2022","unstructured":"Nikita Drobyshev, Jenya Chelishev, Taras Khakhulin, Aleksei Ivakhnenko, Victor Lempitsky, and Egor Zakharov. 2022. MegaPortraits: One-shot megapixel neural head avatars. In Proceedings of the 30th ACM International Conference on Multimedia, 2663\u20132671."},{"key":"e_1_3_3_17_2","first-page":"1","volume-title":"Proceedings of the 2021 IEEE International Conference on Multimedia and Expo Workshops","author":"Feng Dahu","year":"2021","unstructured":"Dahu Feng, Yan Huang, Yiwei Zhang, Jun Ling, Anni Tang, and Li Song. 2021. A generative compression framework for low bandwidth video conference. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo Workshops, 1\u20136. DOI: https:\/\/doi.org\/10\/gnbdr7"},{"issue":"2","key":"e_1_3_3_18_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3491226","article-title":"Transform, warp, and dress: A new transformation-guided model for virtual try-on","volume":"18","author":"Fincato Matteo","year":"2022","unstructured":"Matteo Fincato, Marcella Cornia, Federico Landi, Fabio Cesari, and Rita Cucchiara. 2022. Transform, warp, and dress: A new transformation-guided model for virtual try-on. ACM Transactions on Multimedia Computing, Communications, and Applications 18, 2 (2022), 1\u201324.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"issue":"6","key":"e_1_3_3_19_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3272127.3275043","article-title":"Warp-guided GANs for single-photo facial animation","volume":"37","author":"Geng Jiahao","year":"2018","unstructured":"Jiahao Geng, Tianjia Shao, Youyi Zheng, Yanlin Weng, and Kun Zhou. 2018. Warp-guided GANs for single-photo facial animation. ACM Transactions on Graphics 37, 6 (2018), 1\u201312.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_3_20_2","first-page":"2672","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow Ian","year":"2014","unstructured":"Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 27, 2672\u20132680.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"issue":"5","key":"e_1_3_3_21_2","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1109\/79.952802","article-title":"Theoretical foundations of transform coding","volume":"18","author":"Goyal Vivek K.","year":"2001","unstructured":"Vivek K. Goyal. 2001. Theoretical foundations of transform coding. IEEE Signal Processing Magazine 18, 5 (2001), 9\u201321.","journal-title":"IEEE Signal Processing Magazine"},{"key":"e_1_3_3_22_2","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1109\/ASRU.2013.6707742","volume-title":"Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding","author":"Graves Alex","year":"2013","unstructured":"Alex Graves, Navdeep Jaitly, and Abdel-Rahman Mohamed. 2013. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 273\u2013278."},{"key":"e_1_3_3_23_2","first-page":"10861","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"34","author":"Gu Kuangxiao","year":"2020","unstructured":"Kuangxiao Gu, Yuqian Zhou, and Thomas Huang. 2020. FLNet: Landmark driven fetching and learning network for faithful talking facial animation synthesis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 10861\u201310868."},{"key":"e_1_3_3_24_2","first-page":"5784","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Guo Yudong","year":"2021","unstructured":"Yudong Guo, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao, and Juyong Zhang. 2021. AD-NeRF: Audio driven neural radiance fields for talking head synthesis. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 5784\u20135794."},{"key":"e_1_3_3_25_2","first-page":"34 10893","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Ha Sungjoo","year":"2020","unstructured":"Sungjoo Ha, Martin Kersner, Beomsu Kim, Seokjun Seo, and Dongyoung Kim. 2020. Marionette: Few-shot face reenactment preserving identity of unseen targets. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 10893\u201310900."},{"key":"e_1_3_3_26_2","first-page":"6626","article-title":"GANs trained by a two time-scale update rule converge to a local NASH equilibrium","volume":"30","author":"Heusel Martin","year":"2017","unstructured":"Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local NASH equilibrium. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30, 6626\u20136637.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_27_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Hong Fa-Ting","year":"2022","unstructured":"Fa-Ting Hong, Longhao Zhang, Li Shen, and Dan Xu. 2022. Depth-aware generative adversarial network for talking head video generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"issue":"2","key":"e_1_3_3_28_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3311784","article-title":"Look at me! Correcting eye gaze in live video communication","volume":"15","author":"Hsu Chih-Fan","year":"2019","unstructured":"Chih-Fan Hsu, Yu-Shuen Wang, Chin-Laung Lei, and Kuan-Ta Chen. 2019. Look at me! Correcting eye gaze in live video communication. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 2 (2019), 1\u201321.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_3_29_2","first-page":"1502","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Hu Zhihao","year":"2021","unstructured":"Zhihao Hu, Guo Lu, and Dong Xu. 2021. FVC: A new framework towards deep video compression in feature space. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 1502\u20131511."},{"issue":"11","key":"e_1_3_3_30_2","doi-asserted-by":"crossref","first-page":"4454","DOI":"10.1109\/TCSVT.2021.3053635","article-title":"Modeling acceleration properties for flexible INTRA HEVC complexity control","volume":"31","author":"Huang Yan","year":"2021","unstructured":"Yan Huang, Li Song, Rong Xie, Ebroul Izquierdo, and Wenjun Zhang. 2021. Modeling acceleration properties for flexible INTRA HEVC complexity control. IEEE Transactions on Circuits and Systems for Video Technology 31, 11 (2021), 4454\u20134469.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_3_31_2","first-page":"694","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Johnson Justin","year":"2016","unstructured":"Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision. Springer, 694\u2013711."},{"key":"e_1_3_3_32_2","first-page":"4401","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Karras Tero","year":"2019","unstructured":"Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 4401\u20134410."},{"issue":"4","key":"e_1_3_3_33_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3197517.3201283","article-title":"Deep video portraits","volume":"37","author":"Kim Hyeongwoo","year":"2018","unstructured":"Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Niessner, Patrick P\u00e9rez, Christian Richardt, Michael Zollh\u00f6fer, and Christian Theobalt. 2018. Deep video portraits. ACM Transactions on Graphics 37, 4 (2018), 1\u201314.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_3_34_2","doi-asserted-by":"crossref","unstructured":"Goluck Konuko St\u00e9phane Lathuili\u00e8re and Giuseppe Valenzise. 2022. A hybrid deep animation codec for low-bitrate video conferencing. arXiv:2207.13530. Retrieved from https:\/\/arxiv.org\/abs\/2207.13530","DOI":"10.1109\/ICIP46576.2022.10458867"},{"key":"e_1_3_3_35_2","first-page":"2810","volume-title":"Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP)","author":"Konuko Goluck","year":"2023","unstructured":"Goluck Konuko, St\u00e9phane Lathuili\u00e8re, and Giuseppe Valenzise. 2023. Predictive coding for animation-based video compression. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP). IEEE, 2810\u20132814."},{"key":"e_1_3_3_36_2","first-page":"4210","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP \u201921)","author":"Konuko Goluck","year":"2021","unstructured":"Goluck Konuko, Giuseppe Valenzise, and St\u00e9phane Lathuili\u00e8re. 2021. Ultra-low bitrate video conferencing using deep image animation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP \u201921). IEEE, 4210\u20134214."},{"issue":"2","key":"e_1_3_3_37_2","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1147\/rd.282.0135","article-title":"An introduction to arithmetic coding","volume":"28","author":"Langdon Glen G.","year":"1984","unstructured":"Glen G. Langdon. 1984. An introduction to arithmetic coding. IBM Journal of Research and Development 28, 2 (1984), 135\u2013149.","journal-title":"IBM Journal of Research and Development"},{"key":"e_1_3_3_38_2","first-page":"18114","article-title":"Deep contextual video compression","volume":"34","author":"Li Jiahao","year":"2021","unstructured":"Jiahao Li, Bin Li, and Yan Lu. 2021. Deep contextual video compression. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 34, 18114\u201318125.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_39_2","first-page":"22616","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Jiahao","year":"2023","unstructured":"Jiahao Li, Bin Li, and Yan Lu. 2023. Neural video compression with diverse contexts. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 22616\u201322626."},{"issue":"2","key":"e_1_3_3_40_2","first-page":"2","article-title":"Toward a practical perceptual video quality metric","volume":"6","author":"Li Zhi","year":"2016","unstructured":"Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy, and Megha Manohara. 2016. Toward a practical perceptual video quality metric. The Netflix Tech Blog 6, 2 (2016), 2.","journal-title":"The Netflix Tech Blog"},{"key":"e_1_3_3_41_2","first-page":"5904","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Liu Wen","year":"2019","unstructured":"Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, and Shenghua Gao. 2019. Liquid warping GAN: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 5904\u20135913."},{"key":"e_1_3_3_42_2","first-page":"106","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Liu Xian","year":"2022","unstructured":"Xian Liu, Yinghao Xu, Qianyi Wu, Hang Zhou, Wayne Wu, and Bolei Zhou. 2022. Semantic-aware implicit neural audio-driven video portrait generation. In Proceedings of the European Conference on Computer Vision. Springer, 106\u2013125."},{"key":"e_1_3_3_43_2","first-page":"11006","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Lu Guo","year":"2019","unstructured":"Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. 2019. DVC: An end-to-end deep video compression framework. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 11006\u201311015."},{"issue":"7","key":"e_1_3_3_44_2","first-page":"1901","article-title":"Convolutional neural network-based arithmetic coding for HEVC intra-predicted residues","volume":"30","author":"Ma Changyue","year":"2019","unstructured":"Changyue Ma, Dong Liu, Xiulian Peng, Li Li, and Feng Wu. 2019. Convolutional neural network-based arithmetic coding for HEVC intra-predicted residues. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2019), 1901\u20131916.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"issue":"3","key":"e_1_3_3_45_2","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1109\/76.305878","article-title":"Motion compensation based on spatial transformations","volume":"4","author":"Nakaya Yuichiro","year":"1994","unstructured":"Yuichiro Nakaya and Hiroshi Harashima. 1994. Motion compensation based on spatial transformations. IEEE Transactions on Circuits and Systems for Video Technology 4, 3 (1994), 339\u2013356.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_3_46_2","unstructured":"Telecommunication Standardization Sector of ITU. 1994. ITU-T Recommendation G.114: Transmission Systems and Media: General Recommendations on the Transmission Quality for an Entire International Telephone Connection: One-Way Transmission Time. International Telecommunication Union. Retrieved from https:\/\/books.google.co.jp\/books?id=eQ9RHwAACAAJ"},{"key":"e_1_3_3_47_2","first-page":"2388","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops","author":"Oquab Maxime","year":"2021","unstructured":"Maxime Oquab, Pierre Stock, Daniel Haziza, Tao Xu, Peizhao Zhang, Onur Celebi, Yana Hasson, Patrick Labatut, Bobo Bose-Kolanu, Thibault Peyronel, et al. 2021. Low bandwidth video-chat compression using deep generative models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2388\u20132397."},{"key":"e_1_3_3_48_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Park Taesung","year":"2019","unstructured":"Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_3_49_2","first-page":"296","volume-title":"Proceedings of the 2009 6th IEEE International Conference on Advanced Video and Signal Based Surveillance","author":"Paysan Pascal","year":"2009","unstructured":"Pascal Paysan, Reinhard Knothe, Brian Amberg, Sami Romdhani, and Thomas Vetter. 2009. A 3D face model for pose and illumination invariant face recognition. In Proceedings of the 2009 6th IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, 296\u2013301."},{"key":"e_1_3_3_50_2","first-page":"8620","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Pumarola Albert","year":"2018","unstructured":"Albert Pumarola, Antonio Agudo, Alberto Sanfeliu, and Francesc Moreno-Noguer. 2018. Unsupervised person image synthesis in arbitrary poses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8620\u20138628."},{"key":"e_1_3_3_51_2","first-page":"13759","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Ren Yurui","year":"2021","unstructured":"Yurui Ren, Ge Li, Yuanqi Chen, Thomas H. Li, and Shan Liu. 2021. PIRenderer: Controllable portrait image generation via semantic neural rendering. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 13759\u201313768."},{"issue":"9","key":"e_1_3_3_52_2","doi-asserted-by":"crossref","first-page":"1103","DOI":"10.1109\/TCSVT.2007.905532","article-title":"Overview of the scalable video coding extension of the H.264\/AVC standard","volume":"17","author":"Schwarz Heiko","year":"2007","unstructured":"Heiko Schwarz, Detlev Marpe, and Thomas Wiegand. 2007. Overview of the scalable video coding extension of the H.264\/AVC standard. IEEE Transactions on Circuits and Systems for Video Technology 17, 9 (2007), 1103\u20131120.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"issue":"3","key":"e_1_3_3_53_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3337067","article-title":"Synthesizing facial photometries and corresponding geometries using generative adversarial networks","volume":"15","author":"Shamai Gil","year":"2019","unstructured":"Gil Shamai, Ron Slossberg, and Ron Kimmel. 2019. Synthesizing facial photometries and corresponding geometries using generative adversarial networks. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3s (2019), 1\u201324.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_3_3_54_2","first-page":"666","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Shen Shuai","year":"2022","unstructured":"Shuai Shen, Wanhua Li, Zheng Zhu, Yueqi Duan, Jie Zhou, and Jiwen Lu. 2022. Learning dynamic facial radiance fields for few-shot talking head synthesis. In Proceedings of the European Conference on Computer Vision. Springer, 666\u2013682."},{"key":"e_1_3_3_55_2","doi-asserted-by":"crossref","first-page":"7311","DOI":"10.1109\/TMM.2022.3220421","article-title":"Temporal context mining for learned video compression","volume":"25","author":"Sheng Xihua","year":"2022","unstructured":"Xihua Sheng, Jiahao Li, Bin Li, Li Li, Dong Liu, and Yan Lu. 2022. Temporal context mining for learned video compression. IEEE Transactions on Multimedia 25 (2022), 7311\u20137322.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_3_56_2","first-page":"616","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Shi Yibo","year":"2022","unstructured":"Yibo Shi, Yunying Ge, Jing Wang, and Jue Mao. 2022. AlphaVC: High-performance and efficient learned video compression. In Proceedings of the European Conference on Computer Vision. Springer, 616\u2013631."},{"issue":"16","key":"e_1_3_3_57_2","doi-asserted-by":"crossref","first-page":"2572","DOI":"10.3390\/electronics11162572","article-title":"Intra complexity control algorithm for VVC","volume":"11","author":"Shu Zhengjie","year":"2022","unstructured":"Zhengjie Shu, Junyi Li, Zongju Peng, Fen Chen, and Mei Yu. 2022. Intra complexity control algorithm for VVC. Electronics 11, 16 (2022), 2572.","journal-title":"Electronics"},{"key":"e_1_3_3_58_2","first-page":"7137","article-title":"First order motion model for image animation","volume":"32","author":"Siarohin Aliaksandr","year":"2019","unstructured":"Aliaksandr Siarohin, St\u00e9phane Lathuili\u00e8re, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First order motion model for image animation. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 32, 7137\u20137147.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_59_2","first-page":"13653","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Siarohin Aliaksandr","year":"2021","unstructured":"Aliaksandr Siarohin, Oliver J. Woodford, Jian Ren, Menglei Chai, and Sergey Tulyakov. 2021. Motion representations for articulated animation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 13653\u201313662."},{"issue":"12","key":"e_1_3_3_60_2","doi-asserted-by":"crossref","first-page":"1649","DOI":"10.1109\/TCSVT.2012.2221191","article-title":"Overview of the high efficiency video coding (HEVC) standard","volume":"22","author":"Sullivan Gary J.","year":"2012","unstructured":"Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649\u20131668.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"issue":"4","key":"e_1_3_3_61_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3072959.3073640","article-title":"Synthesizing Obama: Learning lip sync from audio","volume":"36","author":"Suwajanakorn Supasorn","year":"2017","unstructured":"Supasorn Suwajanakorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. 2017. Synthesizing Obama: Learning lip sync from audio. ACM Transactions on Graphics 36, 4 (2017), 1\u201313.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_3_62_2","first-page":"1","volume-title":"Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME)","author":"Tang Anni","year":"2022","unstructured":"Anni Tang, Yan Huang, Jun Ling, Zhiyu Zhang, Yiwei Zhang, Rong Xie, and Li Song. 2022. Generative compression for face video: A hybrid scheme. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), 1\u20136. DOI: 10.1109\/ICME52920.2022.9859867"},{"key":"e_1_3_3_63_2","first-page":"1","volume-title":"Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG \u201921)","author":"Tang Anni","year":"2021","unstructured":"Anni Tang, Han Xue, Jun Ling, Rong Xie, and Li Sang. 2021. Dense 3D coordinate code prior guidance for high-fidelity face swapping and face reenactment. In Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG \u201921). IEEE, 1\u20138."},{"issue":"6","key":"e_1_3_3_64_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3414685.3417803","article-title":"PIE: Portrait image embedding for semantic control","volume":"39","author":"Tewari Ayush","year":"2020","unstructured":"Ayush Tewari, Mohamed Elgharib, B. R. Mallikarjun, Florian Bernard, Hans-Peter Seidel, Patrick P\u00e9rez, Michael Zollh\u00f6fer, and Christian Theobalt. 2020. PIE: Portrait image embedding for semantic control. ACM Transactions on Graphics 39, 6 (2020), 1\u201314.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_3_65_2","first-page":"6142","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Tewari Ayush","year":"2020","unstructured":"Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick P\u00e9rez, Michael Zollhofer, and Christian Theobalt. 2020. StyleRig: Rigging styleGAN for 3D control over portrait images. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 6142\u20136151."},{"issue":"4","key":"e_1_3_3_66_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3306346.3323035","article-title":"Deferred neural rendering: Image synthesis using neural textures","volume":"38","author":"Thies Justus","year":"2019","unstructured":"Justus Thies, Michael Zollh\u00f6fer, and Matthias Nie\u00dfner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics 38, 4 (2019), 1\u201312.","journal-title":"ACM Transactions on Graphics"},{"issue":"6","key":"e_1_3_3_67_2","first-page":"183","article-title":"Real-time expression transfer for facial reenactment","volume":"34","author":"Thies Justus","year":"2015","unstructured":"Justus Thies, Michael Zollh\u00f6fer, Matthias Nie\u00dfner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Transactions on Graphics 34, 6 (2015), 183\u2013181.","journal-title":"ACM Transactions on Graphics"},{"key":"e_1_3_3_68_2","first-page":"2387","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Thies Justus","year":"2016","unstructured":"Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nie\u00dfner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2387\u20132395."},{"key":"e_1_3_3_69_2","first-page":"10039","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Wang Ting-Chun","year":"2021","unstructured":"Ting-Chun Wang, Arun Mallya, and Ming-Yu Liu. 2021. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 10039\u201310049."},{"key":"e_1_3_3_70_2","first-page":"1398","volume-title":"Proceedings of the 37th Asilomar Conference on Signals, Systems and Computers","volume":"2","author":"Wang Zhou","year":"2003","unstructured":"Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In Proceedings of the 37th Asilomar Conference on Signals, Systems and Computers, Vol. 2. IEEE, 1398\u20131402."},{"key":"e_1_3_3_71_2","first-page":"1","volume-title":"Proceedings of the 2021 IEEE International Conference on Multimedia and Expo Workshops","author":"Wieckowski Adam","year":"2021","unstructured":"Adam Wieckowski, Jens Brandenburg, Tobias Hinz, Christian Bartnik, Valeri George, Gabriel Hege, Christian Helmrich, Anastasia Henkel, Christian Lehmann, Christian Stoffers, et al. 2021. VVenC: An open and optimized VVC encoder implementation. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo Workshops, 1\u20132.DOI: 10.1109\/ICMEW53276.2021.9455944."},{"issue":"7","key":"e_1_3_3_72_2","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1109\/TCSVT.2003.815165","article-title":"Overview of the H.264\/AVC video coding standard","volume":"13","author":"Wiegand Thomas","year":"2003","unstructured":"Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H.264\/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560\u2013576.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_3_73_2","unstructured":"Shunyu Yao RuiZhe Zhong Yichao Yan Guangtao Zhai and Xiaokang Yang. 2022. DFA-NeRF: Personalized talking head generation via disentangled face attributes neural rendering. arXiv:2201.00791. Retrieved from https:\/\/arxiv.org\/abs\/2201.00791"},{"key":"e_1_3_3_74_2","first-page":"524","volume-title":"Proceedings of the 16th European Conference on Computer Vision (ECCV \u201920)","author":"Zakharov Egor","year":"2020","unstructured":"Egor Zakharov, Aleksei Ivakhnenko, Aliaksandra Shysheya, and Victor Lempitsky. 2020. Fast bi-layer neural synthesis of one-shot realistic head avatars. In Proceedings of the 16th European Conference on Computer Vision (ECCV \u201920). Springer, 524\u2013540."},{"key":"e_1_3_3_75_2","first-page":"586","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Zhang Richard","year":"2018","unstructured":"Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 586\u2013595."},{"key":"e_1_3_3_76_2","first-page":"3657","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhao Jian","year":"2022","unstructured":"Jian Zhao and Hui Zhang. 2022. Thin-plate spline motion model for image animation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 3657\u20133666."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3783982","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T13:28:31Z","timestamp":1768397311000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3783982"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,14]]},"references-count":75,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3783982"],"URL":"https:\/\/doi.org\/10.1145\/3783982","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,14]]},"assertion":[{"value":"2024-08-06","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-29","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}