{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T18:50:20Z","timestamp":1777143020424,"version":"3.51.4"},"reference-count":51,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2024,12,19]],"date-time":"2024-12-19T00:00:00Z","timestamp":1734566400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62076053"],"award-info":[{"award-number":["62076053"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["20220203057SF"],"award-info":[{"award-number":["20220203057SF"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011789","name":"Department of Science and Technology of Jilin Province","doi-asserted-by":"publisher","award":["62076053"],"award-info":[{"award-number":["62076053"]}],"id":[{"id":"10.13039\/501100011789","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011789","name":"Department of Science and Technology of Jilin Province","doi-asserted-by":"publisher","award":["20220203057SF"],"award-info":[{"award-number":["20220203057SF"]}],"id":[{"id":"10.13039\/501100011789","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>In recent years, the rapid growth of video data posed challenges for storage and transmission. Video compression techniques provided a viable solution to this problem. In this study, we proposed a bidirectional coding video compression model named DeepBiVC, which was based on two-stage learning. Firstly, we conducted preprocessing on the video data by segmenting the video flow into groups of continuous image frames, with each group comprising five frames. Then, in the first stage, we developed an image compression module based on an invertible neural network (INN) model to compress the first and last frames of each group. In the second stage, we designed a video compression module that compressed the intermediate frames using bidirectional optical flow estimation. Experimental results indicated that DeepBiVC outperformed other state-of-the-art video compression methods regarding PSNR and MS-SSIM metrics. Specifically, on the VUG dataset at bpp = 0.3, DeepBiVC achieved a PSNR of 37.16 and an MS-SSIM of 0.98.<\/jats:p>","DOI":"10.3390\/e26121110","type":"journal-article","created":{"date-parts":[[2024,12,19]],"date-time":"2024-12-19T06:50:44Z","timestamp":1734591044000},"page":"1110","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A Novel Video Compression Approach Based on Two-Stage Learning"],"prefix":"10.3390","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1050-2808","authenticated-orcid":false,"given":"Dan","family":"Shao","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Changchun University, Changchun 130022, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ning","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changchun University, Changchun 130022, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pu","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changchun University, Changchun 130022, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Changchun University, Changchun 130022, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1615-6045","authenticated-orcid":false,"given":"Lin","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Software Technology, Dalian University of Technology, Dalian 116024, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,12,19]]},"reference":[{"key":"ref_1","unstructured":"Cisco, U. (2023). Cisco Annual Internet Report (2018\u20132023) White Paper, Cisco."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"614","DOI":"10.1109\/TCOM.1974.1092258","article-title":"Hybrid Coding of Pictorial Data","volume":"22","author":"Habibi","year":"1974","journal-title":"IEEE Trans. Commun."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1098","DOI":"10.1109\/JRPROC.1952.273898","article-title":"A method for the construction of minimum-redundancy codes","volume":"40","author":"Huffman","year":"1952","journal-title":"IRE"},{"key":"ref_4","unstructured":"Taubman, D., and Marcellin, M. (2013). JPEG2000 Image Compression Fundamentals, Standards and Practice, Springer Publishing Company, Incorporated. [1st ed.]."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1004","DOI":"10.1109\/TCOM.1977.1093941","article-title":"A Fast Computational Algorithm for the Discrete Cosine Transform","volume":"25","author":"Chen","year":"1977","journal-title":"IEEE Trans. Commun."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Medvedeva, E.V., Metelev, A.P., and Kryshkina, E.C. (2020, January 11\u201313). Motion compensation method for video encoding. Proceedings of the 2020 Moscow Workshop on Electronic and Networking Technologies (MWENT), Moscow, Russia.","DOI":"10.1109\/MWENT47943.2020.9067466"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1109\/TCSVT.2019.2892608","article-title":"Learning for video compression","volume":"30","author":"Chen","year":"2019","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wu, C.Y., Singhal, N., and Krahenbuhl, P. (2018, January 23\u201327). Video compression through image interpolation. Proceedings of the European Conference on Computer Vision (ECCV), Cham, Switzerland.","DOI":"10.1007\/978-3-030-01237-3_26"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Cheng, Z., Sun, H., Takeuchi, M., and Katto, J. (2019, January 16\u201320). Learning image and video compression through spatial-temporal energy compaction. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01031"},{"key":"ref_10","unstructured":"Ball\u00e9, J., Laparra, V., and Simoncelli, E.P. (2017, January 24\u201326). End-to-end Optimized Image Compression. Proceedings of the 5th 5th International Conference on Learning Representations (ICLR), Toulon, France."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Toderici, G., Vincent, D., Johnston, N., Jin Hwang, S., Minnen, D., Shor, J., and Covell, M. (2017, January 13). Full Resolution Image Compression with Recurrent Neural Networks. Proceedings of the 34th International Conference on Machine Learning (ICML), Los Alamitos, CA, USA.","DOI":"10.1109\/CVPR.2017.577"},{"key":"ref_12","unstructured":"Rippel, O., and Bourdev, L. (2017, January 13). Real-time adaptive image compression. Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, NSW, Australia."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., and Brox, T. (2015, January 7\u201313). Flownet: Learning optical flow with convolutional networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.316"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Ranjan, A., and Black, M.J. (2017, January 22\u201325). Optical flow estimation using a spatial pyramid network. Proceedings of the IEEE\/CVF Conference on Computer Vision Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.291"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C., and Gao, Z. (2019, January 15\u201320). Dvc: An end-to-end deep video compression framework. Proceedings of the IEEE\/CVF Conference on Computer Vision Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01126"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"388","DOI":"10.1109\/JSTSP.2020.3043590","article-title":"Learning for video compression with recurrent auto-encoder and recurrent probability model","volume":"15","author":"Yang","year":"2020","journal-title":"IEEE J. Sel. Top. Signal Process"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/30.125072","article-title":"The JPEG still picture compression standard","volume":"38","author":"Wallace","year":"1992","journal-title":"IEEE Trans. Consum. Electron"},{"key":"ref_18","unstructured":"Britanak, V., Yip, P.C., and Rao, K.R. (2010). Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms, and Integer Approximations, Elsevier. [1st ed.]."},{"key":"ref_19","unstructured":"Google (2023, July 14). Web Picture Format. Available online: https:\/\/developers.google.com\/speed\/webp."},{"key":"ref_20","unstructured":"Bellard, F. (2023, July 15). BPG Image Format. Available online: https:\/\/bellard.org\/bpg\/."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1778","DOI":"10.1109\/TCSVT.2012.2221526","article-title":"High throughput CABAC entropy coding in HEVC","volume":"22","author":"Sze","year":"2012","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_22","unstructured":"Ball\u00e9, J., Laparra, V., and Simoncelli, E.P. (2015). Density Modeling of Images using a Generalized Normalization Transformation. arXiv."},{"key":"ref_23","unstructured":"Mirza, M., and Osindero, S. (2014). Conditional Generative Adversarial Nets. arXiv."},{"key":"ref_24","unstructured":"Denton, E.L., Chintala, S., and Fergus, R. (2015, January 7\u201312). Deep generative image models using a Laplacian pyramid of adversarial networks. Proceedings of the 28th Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada."},{"key":"ref_25","unstructured":"Dinh, L., Krueger, D., and Bengio, Y. (2014). NICE: Non-linear Independent Components Estimation. arXiv."},{"key":"ref_26","unstructured":"Kingma, D.P., and Dhariwal, P. (2018, January 3\u20138). Glow: Generative flow with invertible 1x1 convolutions. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montr\u00e9al, QC, Canada."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xiao, M., Zheng, S., Liu, C., Wang, Y., He, D., Ke, G., Bian, J., Lin, Z., and Liu, T.Y. (2020, January 23\u201328). Invertible image rescaling. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_8"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Sundararajan, D. (2001). The Discrete Fourier Transform: Theory, Algorithms and Applications, World Scientific. [1st ed.].","DOI":"10.1142\/9789812810298"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1109\/PROC.1969.6869","article-title":"Hadamard transform image coding","volume":"57","author":"Pratt","year":"1969","journal-title":"IEEE"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1109\/76.313137","article-title":"Video bridging based on H. 261 standard","volume":"4","author":"Lei","year":"1994","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1109\/35.556485","article-title":"H. 263: Video coding for low-bit-rate communication","volume":"34","author":"Rijkse","year":"1996","journal-title":"IEEE Commun. Mag."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1109\/TCSVT.2003.815165","article-title":"Overview of the H. 264\/AVC video coding standard","volume":"13","author":"Wiegand","year":"2003","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1649","DOI":"10.1109\/TCSVT.2012.2221191","article-title":"Overview of the high efficiency video coding (HEVC) standard","volume":"22","author":"Sullivan","year":"2012","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Hu, Z., Lu, G., and Xu, D. (2021, January 20\u201325). FVC: A new framework towards deep video compression in feature space. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00155"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Yang, R., Mentzer, F., Gool, L.V., and Timofte, R. (2020, January 13\u201319). Learning for video compression with hierarchical quality and recurrent enhancement. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00666"},{"key":"ref_36","unstructured":"Xue, T., Chen, B., Wu, J., Wei, D., and Freeman, W.T. (2017). Video Enhancement with Task-Oriented Flow. arXiv."},{"key":"ref_37","unstructured":"Mercat, A., Viitanen, M., and Vanne, J. (2020, January 28\u201330). UVG dataset: 50\/120fps 4K sequences for video codec analysis and development. Proceedings of the 2020 Multimedia Systems Conference, Seattle, WA, USA."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_39","unstructured":"Dinh, L., Sohl-Dickstein, J.N., and Bengio, S. (2016). Density estimation using Real NVP. arXiv."},{"key":"ref_40","unstructured":"Minnen, D., Ball\u00e9, J., and Toderici, G.D. (2018, January 3\u20138). Joint autoregressive and hierarchical priors for learned image compression. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montr\u00e9al, QC, Canada."},{"key":"ref_41","unstructured":"Duda, J. (2013). Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with compression rate of arithmetic coding. arXiv."},{"key":"ref_42","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, Q., Zhou, X., Hariharan, B., and Snavely, N. (2020, January 23\u201328). Learning feature descriptors using camera pose supervision. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_44"},{"key":"ref_44","unstructured":"Liu, W., Wen, Y., Yu, Z., and Yang, M. (2016). Large-Margin Softmax Loss for Convolutional Neural Networks. arXiv."},{"key":"ref_45","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"874","DOI":"10.1016\/j.jvcir.2014.01.008","article-title":"Visual-PSNR measure of image quality","volume":"25","author":"Tanchenko","year":"2014","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_47","first-page":"1398","article-title":"Multiscale structural similarity for image quality assessment","volume":"Volume 2","author":"Wang","year":"2003","journal-title":"Proceedings of the 35th Asilomar Asilomar Conference on Signals, Systems, and Computers"},{"key":"ref_48","unstructured":"Yang, R., Gool, L.V., and Timofte, R. (2020). OpenDVC: An Open Source Implementation of the DVC Video Compression Method. arXiv."},{"key":"ref_49","first-page":"974","article-title":"End-to-end rate-distortion optimized learned hierarchical bi-directional video compression","volume":"31","author":"Tekalp","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_50","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Proceedings of the 32th Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada."},{"key":"ref_51","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/12\/1110\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:55:27Z","timestamp":1760115327000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/12\/1110"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,19]]},"references-count":51,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["e26121110"],"URL":"https:\/\/doi.org\/10.3390\/e26121110","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,19]]}}}