{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T13:55:40Z","timestamp":1773842140570,"version":"3.50.1"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2021,11,29]],"date-time":"2021-11-29T00:00:00Z","timestamp":1638144000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,11,29]],"date-time":"2021-11-29T00:00:00Z","timestamp":1638144000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100008679","name":"Universidad de C\u00f3rdoba","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100008679","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>CAVLC (Context-Adaptive Variable Length Coding) is a high-performance entropy method for video and image compression. It is the most commonly used entropy method in the video standard H.264. In recent years, several hardware accelerators for CAVLC have been designed. In contrast, high-performance software implementations of CAVLC (e.g., GPU-based) are scarce. A high-performance GPU-based implementation of CAVLC is desirable in several scenarios. On the one hand, it can be exploited as the entropy component in GPU-based H.264 encoders, which are a very suitable solution when GPU built-in H.264 hardware encoders lack certain necessary functionality, such as data encryption and information hiding. On the other hand, a GPU-based implementation of CAVLC can be reused in a wide variety of GPU-based compression systems for encoding images and videos in formats other than H.264, such as medical images. This is not possible with hardware implementations of CAVLC, as they are non-separable components of hardware H.264 encoders. In this paper, we present CAVLCU, an efficient implementation of CAVLC on GPU, which is based on four key ideas. First, we use only one kernel to avoid the long latency global memory accesses required to transmit intermediate results among different kernels, and the costly launches and terminations of additional kernels. Second, we apply an efficient synchronization mechanism for thread-blocks (In this paper, to prevent confusion, a block of pixels of a frame will be referred to as simply <jats:italic>block<\/jats:italic> and a GPU thread block as <jats:italic>thread-block<\/jats:italic>.) that process adjacent frame regions (in horizontal and vertical dimensions) to share results in global memory space. Third, we exploit fully the available global memory bandwidth by using vectorized loads to move directly the quantized transform coefficients to registers. Fourth, we use register tiling to implement the zigzag sorting, thus obtaining high instruction-level parallelism. An exhaustive experimental evaluation showed that our approach is between 2.5<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\times$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> and 5.4<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\times$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> faster than the only state-of-the-art GPU-based implementation of CAVLC.\n<\/jats:p>","DOI":"10.1007\/s11227-021-04183-8","type":"journal-article","created":{"date-parts":[[2021,11,29]],"date-time":"2021-11-29T10:02:59Z","timestamp":1638180179000},"page":"7556-7590","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["CAVLCU: an efficient GPU-based implementation of CAVLC"],"prefix":"10.1007","volume":"78","author":[{"given":"Antonio","family":"Fuentes-Alventosa","sequence":"first","affiliation":[]},{"given":"Juan","family":"G\u00f3mez-Luna","sequence":"additional","affiliation":[]},{"given":"Jos\u00e9 Maria","family":"Gonz\u00e1lez-Linares","sequence":"additional","affiliation":[]},{"given":"Nicol\u00e1s","family":"Guil","sequence":"additional","affiliation":[]},{"given":"R.","family":"Medina-Carnicer","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,11,29]]},"reference":[{"issue":"1\u20132","key":"4183_CR1","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1007\/s11554-007-0054-9","volume":"3","author":"K Babionitakis","year":"2008","unstructured":"Babionitakis K, Doumenis G, Georgakarakos G, Lentaris G, Nakos K, Reisis D, Sifnaios I, Vlassopoulos N (2008) A real-time H. 264\/AVC VLSI encoder architecture. J Real-Time Image Process 3(1\u20132):43\u201359","journal-title":"J Real-Time Image Process"},{"key":"4183_CR2","volume-title":"Multimedia Technologies","author":"A Banerji","year":"2010","unstructured":"Banerji A, Ghosh AM (2010) Multimedia Technologies. Tata McGraw Hill, New Delhi"},{"key":"4183_CR3","doi-asserted-by":"crossref","unstructured":"Chang C W, Lin W H, Yu H C, Fan CP (2014) A high throughput CAVLC architecture design with two-path parallel coefficients procedure for digital cinema 4K resolution H. 264\/AVC encoding. In: Circuits and Systems (ISCAS), 2014 IEEE International Symposium on (pp. 2616-2619). IEEE","DOI":"10.1109\/ISCAS.2014.6865709"},{"key":"4183_CR4","first-page":"017","volume":"3","author":"X Chu","year":"2012","unstructured":"Chu X, Wu S, Chang F, He W (2012) Efficient implementation of the CAVLC entropy encoder based on FPGA [J]. J Xidian Univ 3:017","journal-title":"J Xidian Univ"},{"key":"4183_CR5","doi-asserted-by":"crossref","unstructured":"Damak T, Werda I, Samet A, Masmoudi N (2008) DSP CAVLC implementation and optimization for H. 264\/AVC baseline encoder. In: Electronics, Circuits and Systems, 2008. ICECS 2008. 15th IEEE International Conference on (pp. 45-48). IEEE","DOI":"10.1109\/ICECS.2008.4674787"},{"key":"4183_CR6","doi-asserted-by":"crossref","unstructured":"El-Ghobashy WA, Ebian M, Mowafi O, Zekry AA (2015) An Efficient Implementation Method of H. 264 CAVLC video coding using FPGA. In: Computer Engineering Conference (ICENCO), 2015 11th International (pp. 212-216). IEEE","DOI":"10.1109\/ICENCO.2015.7416350"},{"key":"4183_CR7","doi-asserted-by":"crossref","unstructured":"Fuentes-Alventosa A, G\u00f3mez-Luna J, Gonz\u00e1lez-Linares JM, & Guil N (2014) CUVLE: Variable-Length Encoding on CUDA. In: Design and Architectures for Signal and Image Processing (DASIP), 2014 Conference on(pp. 1-6). IEEE","DOI":"10.1109\/DASIP.2014.7115637"},{"key":"4183_CR8","doi-asserted-by":"crossref","unstructured":"G\u00f3mez-Luna J, Chang LW, Sung IJ, Hwu WM, Guil N (2015) In-Place Data Sliding Algorithms for Many-Core Architectures. In: Parallel Processing (ICPP), 2015 44th International Conference on (pp. 210-219). IEEE","DOI":"10.1109\/ICPP.2015.30"},{"key":"4183_CR9","unstructured":"Google Inc. WebP compression study. Draft 0.1 (May 2011). https:\/\/developers.google.com\/speed\/webp\/docs\/webp\\_study"},{"key":"4183_CR10","unstructured":"Google Inc. Comparative study of WebP, JPEG, and JPEG2000 (August 2012). https:\/\/developers.google.com\/speed\/webp\/docs\/c\\_study"},{"key":"4183_CR11","doi-asserted-by":"crossref","unstructured":"Hoffman MP, Balster EJ, Scarpino F, Hill K (2011) An Efficient Software Implementation of the CAVLC Encoder for H.264\/AVC. In: Proceedings of the 2011 IEEE National Aerospace and Electronics Conference (NAECON), Dayton, OH, , pp. 333-337","DOI":"10.1109\/NAECON.2011.6183127"},{"issue":"1","key":"4183_CR12","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1007\/s11554-013-0345-2","volume":"11","author":"MP Hoffman","year":"2016","unstructured":"Hoffman MP, Balster EJ, Turri WF (2016) High-throughput CAVLC architecture for real-time H. 264 coding using reconfigurable devices. J Real-Time Image Process 11(1):75\u201382","journal-title":"J Real-Time Image Process"},{"issue":"8","key":"4183_CR13","first-page":"637","volume":"57","author":"SC Hsia","year":"2010","unstructured":"Hsia SC, Liao WH (2010) Forward computations for context-adaptive variable-length coding design. IEEE Trans Circ Syst II Exp Briefs 57(8):637\u2013641","journal-title":"IEEE Trans Circ Syst II Exp Briefs"},{"key":"4183_CR14","unstructured":"ITU-T Recommendation H.264 (2019) Advanced video coding for generic audiovisual services"},{"key":"4183_CR15","unstructured":"Khronos group: OpenCL (2020). https:\/\/www.khronos.org\/opencl\/"},{"key":"4183_CR16","doi-asserted-by":"crossref","unstructured":"Kim SM, Kim SB, Hong Y, Won CS (2007) Data Hiding on H. 264\/AVC Compressed Video. In: International Conference Image Analysis and Recognition (pp. 698-707). Springer, Berlin, Heidelberg","DOI":"10.1007\/978-3-540-74260-9_62"},{"issue":"2","key":"4183_CR17","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1007\/s11235-010-9372-5","volume":"49","author":"K Liao","year":"2012","unstructured":"Liao K, Lian S, Guo Z, Wang J (2012) Efficient information hiding in H 264\/AVC video coding. Telecommun Syst 49(2):261\u2013269","journal-title":"Telecommun Syst"},{"key":"4183_CR18","unstructured":"Luitjens J (2013) CUDA Pro Tip: Increase Performance with Vectorized Memory Access, Dec. https:\/\/devblogs.nvidia.com\/cuda-pro-tip-increase-performance-with-vectorized-memory-access\/"},{"key":"4183_CR19","doi-asserted-by":"crossref","unstructured":"Mian C, Jia J, Lei Y (2007) An H. 264 Video Encryption Algorithm Based on Entropy Coding. In: Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007) (Vol. 2, pp. 41-44). IEEE","DOI":"10.1109\/IIH-MSP.2007.86"},{"key":"4183_CR20","doi-asserted-by":"crossref","unstructured":"Mohanty M, Ooi W T (2012) Histopathology Image Streaming. In: Pacific-Rim Conference on Multimedia (pp. 534-545). Springer, Berlin, Heidelberg","DOI":"10.1007\/978-3-642-34778-8_50"},{"key":"4183_CR21","doi-asserted-by":"crossref","unstructured":"Mukherjee R, Banerjee A, Maulik A, Chakrabarty I, Dutta PK, Ray AK (2017) An Efficient VLSI Design of CAVLC Encoder. In: Region 10 Conference, TENCON 2017-2017 IEEE (pp. 805-810). IEEE","DOI":"10.1109\/TENCON.2017.8227969"},{"key":"4183_CR22","unstructured":"NVIDIA: CUDA C Best Practices Guide 11.0 (2020). https:\/\/docs.nvidia.com\/cuda\/cuda-c-best-practices-guide\/index.html"},{"key":"4183_CR23","unstructured":"NVIDIA: CUDA Math API (2020) https:\/\/docs.nvidia.com\/cuda\/cuda-math-api\/index.html"},{"key":"4183_CR24","unstructured":"NVIDIA: CUDA Occupancy Ca1culator (2020) https:\/\/docs.nvidia.com\/cuda\/cuda-occupancy-calculator\/CUDA\\_Occupancy\\_Calculator.xls"},{"key":"4183_CR25","unstructured":"NVIDIA: CUDA C Programming Guide 11.0 (2020) https:\/\/docs.nvidia.com\/cuda\/cuda-c-programming-guide\/index.html"},{"key":"4183_CR26","unstructured":"NVIDIA: CUDA Zone (2020) https:\/\/developer.nvidia.com\/category\/zone\/cuda-zone"},{"key":"4183_CR27","unstructured":"NVIDIA: NVENC Video Encoder API Programming Guide (2020) https:\/\/docs.nvidia.com\/video-technologies\/video-codec-sdk\/nvenc-video-encoder-api-prog-guide\/index.html"},{"key":"4183_CR28","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1016\/j.mejo.2017.07.013","volume":"67","author":"M Orlandic","year":"2017","unstructured":"Orlandic M, Svarstad K (2017) An efficient hardware architecture of CAVLC encoder based on stream processing. Microelectron J 67:43\u201349","journal-title":"Microelectron J"},{"key":"4183_CR29","unstructured":"Ozer J (2016) Encoding for Multiple Screen Delivery. Udemy"},{"issue":"20","key":"4183_CR30","first-page":"603","volume":"118","author":"C Priya","year":"2018","unstructured":"Priya C, Ramya C (2018) Medical image compression based on fuzzy segmentation. Int J Pure Appl Math 118(20):603\u2013610","journal-title":"Int J Pure Appl Math"},{"key":"4183_CR31","doi-asserted-by":"crossref","unstructured":"Ren J, He Y, Wu W, Wen M, Wu N, Zhang C (2009) Software parallel CAVLC encoder based on stream processing. In: Embedded Systems for Real-Time Multimedia, 2009. ESTIMedia 2009. IEEE\/ACM\/IFIP 7th Workshop on (pp. 126-133). IEEE","DOI":"10.1109\/ESTMED.2009.5336822"},{"key":"4183_CR32","doi-asserted-by":"crossref","unstructured":"Richardson, Iain EG (2010) The H.264 Advanced Video Compression Standard, Wiley: Hoboken","DOI":"10.1002\/9780470989418"},{"key":"4183_CR33","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-84882-903-9","volume-title":"Handbook of data compression","author":"D Salomon","year":"2010","unstructured":"Salomon D, Motta G (2010) Handbook of data compression. Springer, New York"},{"key":"4183_CR34","doi-asserted-by":"crossref","unstructured":"Shahid Z, Chaumont M, Puech W (2009) Fast Protection of H. 264\/AVC by Selective Encryption of CABAC. In: 2009 IEEE International Conference on Multimedia and Expo (pp. 1038-1041). IEEE","DOI":"10.1109\/ICME.2009.5202675"},{"key":"4183_CR35","doi-asserted-by":"crossref","unstructured":"Shahid Z, Chaumont M, Puech W (2009) Fast protection of H. 264\/AVC by selective encryption. In: Proceedings Of The Singaporean-French Ipal Symposium 2009: SinFra\u201909 (pp. 11-21)","DOI":"10.1142\/9789814277563_0002"},{"issue":"5","key":"4183_CR36","doi-asserted-by":"publisher","first-page":"565","DOI":"10.1109\/TCSVT.2011.2129090","volume":"21","author":"Z Shahid","year":"2011","unstructured":"Shahid Z, Chaumont M, Puech W (2011) Fast protection of H 264\/AVC by selective encryption of CAVLC and CABAC for I and P frames. IEEE Trans Circ Syst Video Technol 21(5):565\u2013576","journal-title":"IEEE Trans Circ Syst Video Technol"},{"key":"4183_CR37","doi-asserted-by":"crossref","unstructured":"Sridhar KV, Prasad KK (2008) Medical Image Compression Using Advanced Coding Technique. In: 2008 9th International Conference on Signal Processing (pp. 2142-2145). IEEE","DOI":"10.1109\/ICOSP.2008.4697570"},{"key":"4183_CR38","doi-asserted-by":"crossref","unstructured":"Su H, Wen M, Wu N, Ren J, Zhang C (2014) Efficient parallel video processing techniques on GPU: from framework to implementation. The Sci World J","DOI":"10.1155\/2014\/716020"},{"key":"4183_CR39","doi-asserted-by":"crossref","unstructured":"Su H, Zhang C, Chai J, Wen M, Wu N, Ren J, A High-Efficient Software Parallel CAVCL Encoder Based on GPU. In: 2011 34th International Conference on Telecommunications and Signal Processing (TSP), Budapest, 2011, pp. 534-540","DOI":"10.1109\/TSP.2011.6043672"},{"key":"4183_CR40","doi-asserted-by":"crossref","unstructured":"Tabash FK, Izharuddin M (2017) Efficient Encryption Technique for H. 264\/AVC Based on CAVLC and Baker\u2019s Map. In: 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI) (pp. 2759-2764). IEEE","DOI":"10.1109\/ICPCSI.2017.8392221"},{"key":"4183_CR41","first-page":"20","volume":"45","author":"FK Tabash","year":"2019","unstructured":"Tabash FK, Izharuddin M, Tabash MI (2019) Encryption techniques for H. 264\/AVC videos: a literature review. J Inform Secur Appl 45:20\u201334","journal-title":"J Inform Secur Appl"},{"issue":"9","key":"4183_CR42","doi-asserted-by":"publisher","first-page":"1476","DOI":"10.1109\/TCSVT.2013.2248588","volume":"23","author":"Y Wang","year":"2013","unstructured":"Wang Y, O\u2019Neill M, Kurugollu F (2013) A tunable encryption scheme and analysis of fast selective encryption for CAVLC and CABAC in H. 264\/AVC. IEEE Trans Circ Syst Video Technol 23(9):1476\u20131490","journal-title":"IEEE Trans Circ Syst Video Technol"},{"key":"4183_CR43","doi-asserted-by":"crossref","unstructured":"Xiao Z, Baas B (2008) A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-Core System. In: Computer Design. ICCD 2008. In: IEEE International Conference on (pp. 248-254). IEEE","DOI":"10.1109\/ICCD.2008.4751869"},{"key":"4183_CR44","unstructured":"Xiph.org Video Test Media [derf\u2019s collection] (2020). https:\/\/media.xiph.org\/video\/derf\/"},{"issue":"4","key":"4183_CR45","doi-asserted-by":"publisher","first-page":"596","DOI":"10.1109\/TIFS.2014.2302899","volume":"9","author":"D Xu","year":"2014","unstructured":"Xu D, Wang R, Shi YQ (2014) Data hiding in encrypted H. 264\/AVC video streams by codeword substitution. IEEE Trans Inform Foren Secur 9(4):596\u2013606","journal-title":"IEEE Trans Inform Foren Secur"},{"key":"4183_CR46","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1016\/j.jvcir.2016.02.002","volume":"36","author":"D Xu","year":"2016","unstructured":"Xu D, Wang R, Shi YQ (2016) An improved scheme for data hiding in encrypted H. 264\/AVC videos. J Vis Commun Image Rep 36:229\u2013242","journal-title":"J Vis Commun Image Rep"},{"issue":"8","key":"4183_CR47","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1145\/2517327.2442539","volume":"48","author":"S Yan","year":"2013","unstructured":"Yan S, Long G, Zhang Y (2013) StreamScan: fast scan algorithms for GPUs without global barrier synchronization. ACM Sigplan Notices 48(8):229\u2013238","journal-title":"ACM Sigplan Notices"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-021-04183-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-021-04183-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-021-04183-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,1]],"date-time":"2022-04-01T13:52:57Z","timestamp":1648821177000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-021-04183-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,29]]},"references-count":47,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["4183"],"URL":"https:\/\/doi.org\/10.1007\/s11227-021-04183-8","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,29]]},"assertion":[{"value":"27 October 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 November 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}