{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T16:29:57Z","timestamp":1781886597530,"version":"3.54.5"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,7,31]]},"abstract":"<jats:p>The past decade has witnessed the huge success of deep learning in well-known artificial intelligence applications such as face recognition, autonomous driving, and large language model like ChatGPT. Recently, the application of deep learning has been extended to a much wider range, with Neural Network-Based Video Coding (NNVC) being one of them. NNVC can be performed at two different levels: embedding neural network-based (NN-based) coding tools into a classical video compression framework or building the entire compression framework upon neural networks. This article elaborates our studies in response to the recent exploration efforts in JVET (Joint Video Experts Team of ITU-T SG 16 WP 3 and ISO\/IEC JTC 1\/SC29) in the name of NNVC, falling in the former category. Specifically, in this article, we propose two advanced NN-based video coding technologies, i.e., NN-based intra prediction and NN-based in-loop filtering, which have been investigated for several meeting cycles in JVET and then adopted into the reference software, i.e., NNVC. In addition, we further propose a Small Ad-hoc Deep-Learning Library (SADL), which provides integer-based inference capabilities for neural networks to ensure interoperability across different systems. SADL has been adopted as the inference platform of all neural networks in NNVC. Extensive experiments on top of the NNVC have been conducted to evaluate the effectiveness of the proposed techniques. Compared with VTM-11.0_nnvc, the proposed two NN-based coding tools jointly achieve {11.94%, 21.86%, 22.59%}, {9.18%, 19.76%, 20.92%}, and {10.63%, 21.56%, 23.02%} BD-rate reductions on average for {Y, Cb, Cr} under random-access, low-delay, and all-intra configurations, respectively.<\/jats:p>","DOI":"10.1145\/3733108","type":"journal-article","created":{"date-parts":[[2025,5,1]],"date-time":"2025-05-01T04:37:25Z","timestamp":1746074245000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Advanced Neural Network-Based Video Coding Technologies for Intra Prediction and In-Loop Filtering"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1679-2941","authenticated-orcid":false,"given":"Yue","family":"Li","sequence":"first","affiliation":[{"name":"Bytedance, SanDiego, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7603-8599","authenticated-orcid":false,"given":"Junru","family":"Li","sequence":"additional","affiliation":[{"name":"Bytedance, SanDiego, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-7770-6821","authenticated-orcid":false,"given":"Chaoyi","family":"Lin","sequence":"additional","affiliation":[{"name":"Bytedance, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6627-0009","authenticated-orcid":false,"given":"Kai","family":"Zhang","sequence":"additional","affiliation":[{"name":"Bytedance, SanDiego, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3463-9211","authenticated-orcid":false,"given":"Li","family":"Zhang","sequence":"additional","affiliation":[{"name":"Bytedance, SanDiego, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2123-7819","authenticated-orcid":false,"given":"Franck","family":"Galpin","sequence":"additional","affiliation":[{"name":"InterDigital, Rennes, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7751-5219","authenticated-orcid":false,"given":"Thierry","family":"Dumas","sequence":"additional","affiliation":[{"name":"InterDigital, Rennes, France"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3074-9328","authenticated-orcid":false,"given":"Hongtao","family":"Wang","sequence":"additional","affiliation":[{"name":"Qualcomm, San Diego, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6793-0912","authenticated-orcid":false,"given":"Muhammed","family":"Coban","sequence":"additional","affiliation":[{"name":"Qualcomm, San Diego, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8736-0798","authenticated-orcid":false,"given":"Jacob","family":"Str\u00f6m","sequence":"additional","affiliation":[{"name":"Ericsson, Stockholm, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0381-3085","authenticated-orcid":false,"given":"Du","family":"Liu","sequence":"additional","affiliation":[{"name":"Ericsson, Stockholm, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9965-6302","authenticated-orcid":false,"given":"Kenneth","family":"Andersson","sequence":"additional","affiliation":[{"name":"Ericsson, Stockholm, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,7,18]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00853"},{"key":"e_1_3_2_3_2","article-title":"Common test conditions and evaluation procedures for neural network-based video coding technology","author":"Alshina Elena","year":"2023","unstructured":"Elena Alshina, Ru-Ling Liao, Shan Liu, and Andrew Segall. 2023. Common test conditions and evaluation procedures for neural network-based video coding technology. JVET-AC2016.","journal-title":"JVET-AC2016"},{"key":"e_1_3_2_4_2","article-title":"EE1-1.3: Combination of deblocking and NN","author":"Andersson Kenneth","year":"2022","unstructured":"Kenneth Andersson, Jacob Str\u00f6m, Du Liu, and Rickard Sj\u00f6berg. 2022. EE1-1.3: Combination of deblocking and NN. JVET-Z0070.","journal-title":"JVET-Z0070"},{"key":"e_1_3_2_5_2","volume-title":"Calculation of Average PSNR Differences between RD-Curves","author":"Bjontegaard Gisle","year":"2001","unstructured":"Gisle Bjontegaard. 2001. Calculation of Average PSNR Differences between RD-Curves. Technical Report VCEG-M33. VCEG."},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","first-page":"783","DOI":"10.1109\/ICIP40778.2020.9191050","volume-title":"2020 IEEE International Conference on Image Processing (ICIP)","author":"Blanch Marc Gorriz","year":"2020","unstructured":"Marc Gorriz Blanch, Saverio Blasi, Alan Smeaton, Noel, E. O\u2019Connor, and Marta Mrak. 2020. Chroma intra prediction with attention-based CNN architectures. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 783\u2013787."},{"key":"e_1_3_2_7_2","article-title":"JVET common test conditions and software reference configurations for SDR video","author":"Bossen Frank","year":"2018","unstructured":"Frank Bossen, Jill Boyce, Xiang Li, Vadim Seregin, and Karsten S\u00fchring. 2018. JVET common test conditions and software reference configurations for SDR video. JVET-K1010.","journal-title":"JVET-K1010"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3101953"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3038348"},{"key":"e_1_3_2_10_2","first-page":"45","article-title":"The (new) Yale sparse matrix package","author":"Eisenstat S. C.","year":"1983","unstructured":"S. C. Eisenstat, Howard Elman, M. H. Schultz, and A. H. Sherman. 1983. The (new) Yale sparse matrix package. Elliptic Problem Solvers II (1983), 45\u201352.","journal-title":"Elliptic Problem Solvers"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2012.2221529"},{"key":"e_1_3_2_12_2","unstructured":"Franck Galpin Pavel Nikitin Thierry Dumas and Philippe Bordes. 2021. SADL Small Adhoc Deep-Learning Library. Technical Report JVET-W0181. InterDigital. Retrieved from https:\/\/vcgit.hhi.fraunhofer.de\/jvet-ahg-nnvc\/sadl"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2920603"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2896489"},{"key":"e_1_3_2_15_2","unstructured":"Zhaoyang Jia Bin Li Jiahao Li Wenxuan Xie Linfeng Qi Houqiang Li and Yan Lu. 2025. Towards practical real-time neural video compression. arXiv:2502.20762. Retrieved from https:\/\/arxiv.org\/abs\/2502.20762"},{"key":"e_1_3_2_16_2","first-page":"1","volume-title":"2016 Picture Coding Symposium (PCS)","author":"Karczewicz Marta","year":"2016","unstructured":"Marta Karczewicz, Li Zhang, Wei-Jung Chien, and Xiang Li. 2016. Geometry transformation-based adaptive in-loop filter. In 2016 Picture Coding Symposium (PCS). IEEE, 1\u20135."},{"key":"e_1_3_2_17_2","volume-title":"Picture Coding Symposium","author":"Koo Moonmo","year":"2019","unstructured":"Moonmo Koo, Mehdi Salehifar, Jaehyun Lim, and Seung-Hwan Kim. 2019. Low frequency non-separable transform (LFNST). In Picture Coding Symposium."},{"key":"e_1_3_2_18_2","first-page":"9","article-title":"Efficient backprop","author":"LeCun Yann","year":"1998","unstructured":"Yann LeCun, Leon Bottou, Genevieve B. Orr, and Klaus-Robert Muller. 1998. Efficient backprop. In Neural Networks: Tricks of the Trade. Gr\u00e9goire Montavon, Genevi\u00e8ve B. Orr, and Klaus-Robert M\u00fcller (Eds.). Springer, 9\u201348.","journal-title":"Neural Networks: Tricks of the Trade"},{"key":"e_1_3_2_19_2","first-page":"18114","article-title":"Deep contextual video compression","volume":"34","author":"Li Jiahao","year":"2021","unstructured":"Jiahao Li, Bin Li, and Yan Lu. 2021. Deep contextual video compression. Advances in Neural Information Processing Systems 34 (2021), 18114\u201318125.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_20_2","first-page":"26099","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Jiahao","year":"2024","unstructured":"Jiahao Li, Bin Li, and Yan Lu. 2024. Neural video compression with feature modulation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 26099\u201326108."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2817044"},{"issue":"9","key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"2316","DOI":"10.1109\/TCSVT.2017.2727682","article-title":"Convolutional neural network-based block up-sampling for intra frame coding","volume":"28","author":"Li Yue","year":"2017","unstructured":"Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, and Haitao Yang. 2017. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Transactions on Circuits and Systems for Video Technology 28, 9 (2017), 2316\u20132330.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","first-page":"2104","DOI":"10.1109\/ICIP42928.2021.9506027","volume-title":"2021 IEEE International Conference on Image Processing (ICIP)","author":"Li Yue","year":"2021","unstructured":"Yue Li, Li Zhang, and Kai Zhang. 2021. Convolutional neural network based in-loop filter for VVC intra coding. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, 2104\u20132108."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3529107"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3152627"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01126"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2021.3108943"},{"key":"e_1_3_2_28_2","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1109\/DCC.2018.00027","volume-title":"2018 Data Compression Conference","author":"Meng Xiandong","year":"2018","unstructured":"Xiandong Meng, Chen Chen, Shuyuan Zhu, and Bing Zeng. 2018. A new HEVC in-loop filter based on multi-channel long-short-term dependency residual networks. In 2018 Data Compression Conference. IEEE, 187\u2013196."},{"key":"e_1_3_2_29_2","doi-asserted-by":"crossref","first-page":"1711","DOI":"10.1109\/ICIP46576.2022.9897324","volume-title":"2022 IEEE International Conference on Image Processing (ICIP)","author":"Merkle Philipp","year":"2022","unstructured":"Philipp Merkle, Martin Winken, Jonathan Pfaff, Heiko Schwarz, Detlev Marpe, and Thomas Wiegand. 2022. Intra-inter prediction for versatile video coding using a residual convolutional neural network. In 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 1711\u20131715."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/OJSP.2021.3092598"},{"key":"e_1_3_2_31_2","first-page":"8026","article-title":"Pytorch: An imperative style, high-performance deep learning library","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems(NeurIPS 2019), Article No. 721, 8026\u20138037.","journal-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems(NeurIPS 2019)"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3072430"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3194085.3194087"},{"key":"e_1_3_2_34_2","first-page":"1","volume-title":"2017 IEEE Visual Communications and Image Processing (VCIP)","author":"Song Rui","year":"2017","unstructured":"Rui Song, Dong Liu, Houqiang Li, and Feng Wu. 2017. Neural network-based arithmetic coding of intra prediction modes in HEVC. In 2017 IEEE Visual Communications and Image Processing (VCIP). IEEE, 1\u20134."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2012.2221191"},{"key":"e_1_3_2_36_2","first-page":"21","volume-title":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","author":"Sun Heming","year":"2020","unstructured":"Heming Sun, Lu Yu, and Jiro Katto. 2020. Fully neural network mode based intra prediction of variable block size. In 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). IEEE, 21\u201324."},{"key":"e_1_3_2_37_2","first-page":"1110","volume-title":"CVPR Workshops","author":"Timofte Radu","year":"2017","unstructured":"Radu Timofte, Eirikur Agustsson, Luc Van Gool, Ming-Hsuan Yang, Lei Zhang, and Bee Lim. 2017. NTIRE 2017 challenge on single image super-resolution: Methods and results. In CVPR Workshops. IEEE, 1110\u20131121."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1038\/d41586-023-00288-7"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.10.081"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2944473"},{"issue":"7","key":"e_1_3_2_41_2","first-page":"1803","article-title":"Multi-scale convolutional neural network-based intra prediction for video coding","volume":"30","author":"Wang Yang","year":"2019","unstructured":"Yang Wang, Xiaopeng Fan, Shaohui Liu, Debin Zhao, and Wen Gao. 2019. Multi-scale convolutional neural network-based intra prediction for video coding. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2019), 1803\u20131815.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2003.815165"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_1"},{"issue":"3","key":"e_1_3_2_44_2","first-page":"840","article-title":"Convolutional neural network-based fractional-pixel motion compensation","volume":"29","author":"Yan Ning","year":"2018","unstructured":"Ning Yan, Dong Liu, Houqiang Li, Bin Li, Li Li, and Feng Wu. 2018. Convolutional neural network-based fractional-pixel motion compensation. IEEE Transactions on Circuits and Systems for Video Technology 29, 3 (2018), 840\u2013853.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_3_2_45_2","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1109\/VCIP49819.2020.9301790","volume-title":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","author":"Yang Kun","year":"2020","unstructured":"Kun Yang, Dong Liu, and Feng Wu. 2020. Deep learning-based nonlinear transform for HEVC intra coding. In 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). IEEE, 387\u2013390."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2830640"},{"issue":"10","key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"3878","DOI":"10.1109\/TCSVT.2021.3087706","article-title":"Transform coding in the VVC standard","volume":"31","author":"Zhao Xin","year":"2021","unstructured":"Xin Zhao, Seung-Hwan Kim, Yin Zhao, Hilmi E. Egilmez, Moonmo Koo, Shan Liu, Jani Lainema, and Marta Karczewicz. 2021. Transform coding in the VVC standard. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (Oct. 2021), 3878\u20133890.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3733108","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,18]],"date-time":"2025-07-18T23:53:44Z","timestamp":1752882824000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3733108"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,18]]},"references-count":46,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,31]]}},"alternative-id":["10.1145\/3733108"],"URL":"https:\/\/doi.org\/10.1145\/3733108","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,18]]},"assertion":[{"value":"2025-01-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}