{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T01:45:37Z","timestamp":1773193537261,"version":"3.50.1"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"8","license":[{"start":{"date-parts":[[2024,6,13]],"date-time":"2024-06-13T00:00:00Z","timestamp":1718236800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Outstanding Talents Training Fund in Shenzhen"},{"name":"Shenzhen Science and Technology Program\u2013Shenzhen Cultivation of Excellent Scientific and Technological Innovation Talents project","award":["RCJC20200714114435057"],"award-info":[{"award-number":["RCJC20200714114435057"]}]},{"name":"Shenzhen Science and Technology Program\u2013Shenzhen Hong Kong joint funding project","award":["SGDX20211123144400001"],"award-info":[{"award-number":["SGDX20211123144400001"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U21B2012 and R24115SG"],"award-info":[{"award-number":["U21B2012 and R24115SG"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"MIGU-PKU Meta Vision Technology Innovation Lab"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,8,31]]},"abstract":"<jats:p>Learned video compression has drawn great attention and shown promising compression performance recently. In this article, we focus on the two components in the learned video compression framework, the conditional entropy model and quality enhancement module, to improve compression performance. Specifically, we propose an adaptive spatial-temporal entropy model for image, motion, and residual compression, which introduces a temporal prior to reduce temporal redundancy of latents and an additional modulated mask to evaluate the similarity and perform refinement. In addition, a quality enhancement module is proposed for predicted frame and reconstructed frame to improve frame quality and reduce the bitrate cost of residual coding. The module reuses decoded optical flow as a motion prior and utilizes deformable convolution to mine high-quality information from the reference frame in a bit-free manner. The two proposed coding tools are integrated into a pixel-domain residual coding\u2013based compression framework to evaluate their effectiveness. Experimental results demonstrate that our framework achieves competitive compression performance in the low-delay scenario compared with recent learning-based methods and traditional H.265\/HEVC in terms of Peak Signal-to-Noise Ratio (PSNR) and Multi-Scale Structural Similarity Index (MS-SSIM). The code is available at OpenLVC.<\/jats:p>","DOI":"10.1145\/3661824","type":"journal-article","created":{"date-parts":[[2024,4,27]],"date-time":"2024-04-27T09:28:01Z","timestamp":1714210081000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9729-1294","authenticated-orcid":false,"given":"Jiayu","family":"Yang","sequence":"first","affiliation":[{"name":"Peking University Shenzhen Graduate School, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2523-8408","authenticated-orcid":false,"given":"Chunhui","family":"Yang","sequence":"additional","affiliation":[{"name":"Peking University Shenzhen Graduate School, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4013-2203","authenticated-orcid":false,"given":"Fei","family":"Xiong","sequence":"additional","affiliation":[{"name":"Peking University Shenzhen Graduate School, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3748-1392","authenticated-orcid":false,"given":"Yongqi","family":"Zhai","sequence":"additional","affiliation":[{"name":"Peking University Shenzhen Graduate School, Shenzhen, China and Peng Cheng Laboratory, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0873-0465","authenticated-orcid":false,"given":"Ronggang","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, China, Peng Cheng Laboratory, Shenzhen, China, and MIGU Video Co., Ltd., Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2024,6,13]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"1141","volume-title":"Advances in Neural Information Processing Systems","author":"Agustsson Eirikur","year":"2017","unstructured":"Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc V. Gool. 2017. Soft-to-hard vector quantization for end-to-end learning compressible representations. In Advances in Neural Information Processing Systems. 1141\u20131151."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00853"},{"key":"e_1_3_2_4_2","article-title":"Density modeling of images using a generalized normalization transformation","author":"Ball\u00e9 Johannes","year":"2015","unstructured":"Johannes Ball\u00e9, Valero Laparra, and Eero P. Simoncelli. 2015. Density modeling of images using a generalized normalization transformation. arXiv preprint arXiv:1511.06281 (2015).","journal-title":"arXiv preprint arXiv:1511.06281"},{"key":"e_1_3_2_5_2","volume-title":"International Conference on Learning Representations","author":"Ball\u00e9 Johannes","year":"2016","unstructured":"Johannes Ball\u00e9, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-end optimized image compression. In International Conference on Learning Representations."},{"key":"e_1_3_2_6_2","volume-title":"International Conference on Learning Representations","author":"Ball\u00e9 Johannes","year":"2018","unstructured":"Johannes Ball\u00e9, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. In International Conference on Learning Representations."},{"key":"e_1_3_2_7_2","article-title":"CompressAI: A PyTorch library and evaluation platform for end-to-end compression research","author":"B\u00e9gaint Jean","year":"2020","unstructured":"Jean B\u00e9gaint, Fabien Racap\u00e9, Simon Feltman, and Akshay Pushparaja. 2020. CompressAI: A PyTorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029 (2020).","journal-title":"arXiv preprint arXiv:2011.03029"},{"key":"e_1_3_2_8_2","article-title":"Calculation of average PSNR differences between RD-curves","author":"Bjontegaard Gisle","year":"2001","unstructured":"Gisle Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. VCEG-M33 (2001).","journal-title":"VCEG-M33"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3101953"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00796"},{"key":"e_1_3_2_11_2","article-title":"MMCV: OpenMMLab Computer Vision Foundation","author":"Contributors MMCV","year":"2018","unstructured":"MMCV Contributors. 2018. MMCV: OpenMMLab Computer Vision Foundation. Retrieved from https:\/\/github.com\/open-mmlab\/mmcv.","journal-title":"R"},{"key":"e_1_3_2_12_2","article-title":"G-VAE: A continuously variable rate deep image compression framework","author":"Cui Ze","year":"2020","unstructured":"Ze Cui, Jing Wang, Bo Bai, Tiansheng Guo, and Yihui Feng. 2020. G-VAE: A continuously variable rate deep image compression framework. arXiv preprint arXiv:2003.02012 (2020).","journal-title":"arXiv preprint arXiv:2003.02012"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00652"},{"key":"e_1_3_2_14_2","article-title":"Versatile learned video compression","author":"Feng Runsen","year":"2021","unstructured":"Runsen Feng, Zongyu Guo, Zhizheng Zhang, and Zhibo Chen. 2021. Versatile learned video compression. arXiv preprint arXiv:2111.03386 (2021).","journal-title":"arXiv preprint arXiv:2111.03386"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548156"},{"key":"e_1_3_2_16_2","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Golinski Adam","year":"2020","unstructured":"Adam Golinski, Reza Pourreza, Yang Yang, Guillaume Sautiere, and Taco S. Cohen. 2020. Feedback recurrent autoencoder for video compression. In Proceedings of the Asian Conference on Computer Vision."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2023.3287495"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3089491"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00713"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19787-1_12"},{"key":"e_1_3_2_21_2","article-title":"PyTorch Video Compression","author":"Hu Xhihao","year":"2020","unstructured":"Xhihao Hu. 2020. PyTorch Video Compression. Retrieved from https:\/\/github.com\/ZhihaoHu\/PyTorchVideoCompression","journal-title":"R"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58536-5_12"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00583"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00155"},{"key":"e_1_3_2_25_2","article-title":"Deep contextual video compression","volume":"34","author":"Li Jiahao","year":"2021","unstructured":"Jiahao Li, Bin Li, and Yan Lu. 2021. Deep contextual video compression. Advances in Neural Information Processing Systems 34 (2021).","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547845"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00360"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2022.3233221"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00076"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP46576.2022.9897989"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2022.3150014"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2020.3035680"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6825"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_27"},{"key":"e_1_3_2_35_2","article-title":"Decoupled weight decay regularization","author":"Loshchilov Ilya","year":"2017","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).","journal-title":"arXiv preprint arXiv:1711.05101"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01126"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2988453"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00462"},{"key":"e_1_3_2_39_2","article-title":"VCT: A video compression transformer","author":"Mentzer Fabian","year":"2022","unstructured":"Fabian Mentzer, George Toderici, David Minnen, Sung-Jin Hwang, Sergi Caelles, Mario Lucic, and Eirikur Agustsson. 2022. VCT: A video compression transformer. arXiv preprint arXiv:2206.07307 (2022).","journal-title":"arXiv preprint arXiv:2206.07307"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3339825.3394937"},{"key":"e_1_3_2_41_2","article-title":"Joint autoregressive and hierarchical priors for learned image compression","author":"Minnen David","year":"2018","unstructured":"David Minnen, Johannes Ball\u00e9, and George Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. arXiv preprint arXiv:1809.02736 (2018).","journal-title":"arXiv preprint arXiv:1809.02736"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP40778.2020.9190935"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00661"},{"key":"e_1_3_2_44_2","article-title":"Boosting neural video codecs by exploiting hierarchical redundancy","author":"Pourreza Reza","year":"2022","unstructured":"Reza Pourreza, Hoang Le, Amir Said, Guillaume Sautiere, and Auke Wiggers. 2022. Boosting neural video codecs by exploiting hierarchical redundancy. arXiv preprint arXiv:2208.04303 (2022).","journal-title":"arXiv preprint arXiv:2208.04303"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.291"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01421"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3220421"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19800-7_36"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2012.2221191"},{"key":"e_1_3_2_50_2","article-title":"Spatiotemporal entropy model is all you need for learned video compression","author":"Sun Zhenhong","year":"2021","unstructured":"Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Dongyang Li, Yichen Qian, and Hao Li. 2021. Spatiotemporal entropy model is all you need for learned video compression. arXiv preprint arXiv:2104.06083 (2021).","journal-title":"arXiv preprint arXiv:2104.06083"},{"key":"e_1_3_2_51_2","article-title":"Lossy image compression with compressive autoencoders","author":"Theis Lucas","year":"2017","unstructured":"Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Husz\u00e1r. 2017. Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395 (2017).","journal-title":"arXiv preprint arXiv:1703.00395"},{"key":"e_1_3_2_52_2","article-title":"Exploring long & short range temporal information for learned video compression","author":"Wang Huairui","year":"2022","unstructured":"Huairui Wang and Zhenzhong Chen. 2022. Exploring long & short range temporal information for learned video compression. arXiv preprint arXiv:2208.03754 (2022).","journal-title":"arXiv preprint arXiv:2208.03754"},{"key":"e_1_3_2_53_2","article-title":"Learned video compression via heterogeneous deformable compensation network","author":"Wang Huairui","year":"2022","unstructured":"Huairui Wang, Zhenzhong Chen, and Chang Wen Chen. 2022. Learned video compression via heterogeneous deformable compensation network. arXiv preprint arXiv:2207.04589 (2022).","journal-title":"arXiv preprint arXiv:2207.04589"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2016.7532610"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2003.815165"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-018-01144-2"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00666"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2020.3043590"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3138300"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548314"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00953"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3661824","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3661824","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:06:21Z","timestamp":1750291581000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3661824"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,13]]},"references-count":60,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,8,31]]}},"alternative-id":["10.1145\/3661824"],"URL":"https:\/\/doi.org\/10.1145\/3661824","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,13]]},"assertion":[{"value":"2023-08-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-17","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}