{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T13:13:51Z","timestamp":1753881231190,"version":"3.41.2"},"reference-count":31,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2012,5,29]],"date-time":"2012-05-29T00:00:00Z","timestamp":1338249600000},"content-version":"vor","delay-in-days":149,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"funder":[{"DOI":"10.13039\/100011102","name":"Seventh Framework Programme","doi-asserted-by":"publisher","award":["247615"],"award-info":[{"award-number":["247615"]}],"id":[{"id":"10.13039\/100011102","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["VLSI Design"],"published-print":{"date-parts":[[2012,1]]},"abstract":"<jats:p>The optimization process of a H.264\/AVC encoder on three different architectures is presented. The architectures are multi\u2010 and singlecore and SIMD instruction sets have different vector registers size. The need of code optimization is fundamental when addressing HD resolutions with real\u2010time constraints. The encoder is subdivided in functional modules in order to better understand where the optimization is a key factor and to evaluate in details the performance improvement. Common issues in both partitioning a video encoder into parallel architectures and SIMD optimization are described, and author solutions are presented for all the architectures. Besides showing efficient video encoder implementations, one of the main purposes of this paper is to discuss how the characteristics of different architectures and different set of SIMD instructions can impact on the target application performance. Results about the achieved speedup are provided in order to compare the different implementations and evaluate the more suitable solutions for present and next generation video\u2010coding algorithms.<\/jats:p>","DOI":"10.1155\/2012\/413747","type":"journal-article","created":{"date-parts":[[2012,6,5]],"date-time":"2012-06-05T15:22:53Z","timestamp":1338909773000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["An Efficient Multi\u2010Core SIMD Implementation for H.264\/AVC Encoder"],"prefix":"10.1155","volume":"2012","author":[{"given":"M.","family":"Bariani","sequence":"first","affiliation":[]},{"given":"P.","family":"Lambruschini","sequence":"additional","affiliation":[]},{"given":"M.","family":"Raggio","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2012,5,29]]},"reference":[{"key":"e_1_2_8_1_2","unstructured":"VC-1 Compressed Video Bitstream Format and Decoding Process SMPTE 421M-2006 SMPTE Standard 2006."},{"key":"e_1_2_8_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2003.815165"},{"key":"e_1_2_8_3_2","doi-asserted-by":"crossref","unstructured":"SullivanG. J. TopiwalaP. andLuthraA. The H.264\/AVC Advanced Video Coding Standard: Overview and Introduction to the Fidelity Range Extensions Applications of Digital Image Processing XXVII August 2004 Proceedings of SPIE.","DOI":"10.1117\/12.564457"},{"key":"e_1_2_8_4_2","doi-asserted-by":"crossref","unstructured":"MarpeD. WiegandT. andGordonS. H.264\/MPEG4-AVC fidelity range extensions: tools profiles performance and application areas IEEE International Conference on Image Processing (ICIP \u203205) September 2005 593\u2013596 2-s2.0-33749599768 https:\/\/doi.org\/10.1109\/ICIP.2005.1529820.","DOI":"10.1109\/ICIP.2005.1529820"},{"key":"e_1_2_8_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2007.905532"},{"key":"e_1_2_8_6_2","unstructured":"Collaborative Team on Video Coding (JCT-VC)Joint WD4: Working Draft 4 of High-Efficiency Video Coding 6th Meeting Torino Italy July 2011."},{"key":"e_1_2_8_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11265\u2010007\u20100116\u2010z"},{"key":"e_1_2_8_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2011.5955207"},{"key":"e_1_2_8_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2008.923398"},{"key":"e_1_2_8_10_2","doi-asserted-by":"crossref","unstructured":"RintaluomaT.andSilv\u00e9nO. SIMD performance in software based mobile video coding 10th International Conference on Embedded Computer Systems: Architectures Modeling and Simulation (IC-SAMOS \u203210) July 2010 79\u201385 2-s2.0-78650928300 https:\/\/doi.org\/10.1109\/ICSAMOS.2010.5642079.","DOI":"10.1109\/ICSAMOS.2010.5642079"},{"key":"e_1_2_8_11_2","doi-asserted-by":"crossref","unstructured":"LvH. MaL. andLiuH. Analysis and optimization of the UMHexagons algorithm in H.264 based on SIMD 2nd International Conference on Communication Systems Networks and Applications (ICCSNA \u203210) July 2010 239\u2013244 2-s2.0-78649271799 https:\/\/doi.org\/10.1109\/ICCSNA.2010.5588702.","DOI":"10.1109\/ICCSNA.2010.5588702"},{"key":"e_1_2_8_12_2","doi-asserted-by":"crossref","unstructured":"ZhouX. Q. LiE. andChenY.-K. Implementation of H.264 decoder on general-purpose processors with media instructions Image and Video Communications and Processing January 2003 Santa Clara Calif USA.","DOI":"10.1117\/12.484746"},{"key":"e_1_2_8_13_2","doi-asserted-by":"crossref","unstructured":"LeeJ. MoonS. andSungW. H.264 decoder optimization exploiting SIMD instructions IEEE Asia-Pacific Conference on Circuits and Systems (APCCAS \u203204) December 2004 1149\u20131152 2-s2.0-21644450237.","DOI":"10.1109\/APCCAS.2004.1413088"},{"key":"e_1_2_8_14_2","doi-asserted-by":"crossref","first-page":"1769","DOI":"10.1109\/TCSVT.2011.2130250","article-title":"Improved SIMD architecture for high performance video processors","volume":"21","author":"Lo W.","year":"2011","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_2_8_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2003.1223637"},{"key":"e_1_2_8_16_2","doi-asserted-by":"publisher","DOI":"10.1155\/2007\/58431"},{"key":"e_1_2_8_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11265\u2010008\u20100304\u20105"},{"key":"e_1_2_8_18_2","doi-asserted-by":"crossref","unstructured":"FaraboschiP. BrownG. FisherJ. A. DesoliG. andHomewoodF. Lx: a technology platform for customizable VLIW embedded processing 27th Annual International Symposium on Computer Architecture (ISCA \u203200) June 2000 203\u2013213 2-s2.0-0033703885.","DOI":"10.1145\/339647.339682"},{"key":"e_1_2_8_19_2","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1109\/MSSC.2009.932433","article-title":"VLIW processors: from blue sky to best buy","volume":"1","author":"Fisher J.","year":"2009","journal-title":"IEEE Solid-State Circuits Magazine"},{"key":"e_1_2_8_20_2","doi-asserted-by":"crossref","unstructured":"CosteN. GaravelH. HermannsH. LangF. MateescuR. andSerweW. Ten Years of Performance Evaluation for Concurrent Systems using CADP 4th International Symposium on Leveraging Applications of Formal Methods Verification and Validation ISoLA 2010 Heraklion Greece.","DOI":"10.1007\/978-3-642-16561-0_18"},{"key":"e_1_2_8_21_2","doi-asserted-by":"crossref","unstructured":"PandiniD. DesoliG. andCremonesiA. Computing and design for software and silicon manufacturing IFIP International Conference on Very Large Scale Integration (VLSI \u203207) October 2007 122\u2013127 2-s2.0-50149111602 https:\/\/doi.org\/10.1109\/VLSISOC.2007.4402484.","DOI":"10.1109\/VLSISOC.2007.4402484"},{"key":"e_1_2_8_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCAS.2006.1648987"},{"key":"e_1_2_8_23_2","unstructured":"BeniniL. P2012: a many-core platform for 10Gops\/mm2 multimedia computing 21st IEEE International Symposium on Rapid System Prototyping June 2010 Fairfax Va USA."},{"key":"e_1_2_8_24_2","doi-asserted-by":"crossref","unstructured":"SilvanoC. FornaciariW. Crespi ReghizziS. AgostaG. PalermoG. ZaccariaV. BellasiP. CastroF. CorbettaS. Di BiagioA. SpezialeE. TartaraM. SiorpaesD. H\u00fcbertH. StabernackB. BrandenburgJ. PalkovicM. RaghavanP. Ykman-CouvreurC. BartzasA. XydisS. SoudrisD. KempfT. AscheidG. LeupersR. MeyrH. AnsariJ. M\u00e4h\u00f6nenP. andVanthournoutB. 2PARMA: parallel paradigms and run-time management techniques for many-core architectures IEEE Annual Symposium on VLSI July 2010 494\u2013499 2-s2.0-77957895217 https:\/\/doi.org\/10.1109\/ISVLSI.2010.93.","DOI":"10.1109\/ISVLSI.2010.93"},{"key":"e_1_2_8_25_2","doi-asserted-by":"crossref","unstructured":"MucciC. VanzoliniL. MiriminI. GazzolaD. DeleddaA. GollerS. KnaebleinJ. SchneiderA. CiccarelliL. andCampiF. Implementation of parallel LFSR-based applications on an adaptive DSP featuring a Pipelined Configurable Gate Array Design Automation and Test in Europe (DATE \u203208) March 2008 1444\u20131449 2-s2.0-49749123800 https:\/\/doi.org\/10.1109\/DATE.2008.4484877.","DOI":"10.1109\/DATE.2008.4484877"},{"key":"e_1_2_8_26_2","doi-asserted-by":"crossref","unstructured":"PaulinP. Programming challenges & solutions for multi-processor SoCs: An industrial perspective Design Automation Conference (DAC \u203211) June 2011.","DOI":"10.1145\/2024724.2024785"},{"key":"e_1_2_8_27_2","unstructured":"KumarA. AlfonsoD. PezzoniL. andOlmoG. A complexity scalable H.264\/AVC encoder for mobile terminals European Signal Processing Conference (EUSIPCO \u203208) August 2008 Lausanne Switzerland."},{"key":"e_1_2_8_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2006.871388"},{"key":"e_1_2_8_29_2","doi-asserted-by":"crossref","unstructured":"ZattB. ShafiqueM. SampaioF. AgostiniL. BampiS. andHenkelJ. Run-Time Adaptive Energy-Aware Motion and Disparity Estimation in Multiview Video Coding 48th Design Automation Conference (DAC \u203211) June 2011 San Diego Calif USA 1026\u20131031.","DOI":"10.1145\/2024724.2024950"},{"key":"e_1_2_8_30_2","unstructured":"BarianiM. BarbieriI. BrizzolaraD. andRaggioM. H.264 implementation on SIMD VLIW cores STreaming Day 2007 Genova Italy."},{"key":"e_1_2_8_31_2","doi-asserted-by":"crossref","unstructured":"LubobyaC. S. DlodloM. E. de JagerG. andFergusonK. L. SIMD implementation of integer DCT and hadamard transforms in H.264\/AVC encoder Proceedings of the IEEE AFRICON September 2011 1\u20135.","DOI":"10.1109\/AFRCON.2011.6071998"}],"container-title":["VLSI Design"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/archive\/2012\/413747.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/archive\/2012\/413747.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2012\/413747","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,29]],"date-time":"2025-03-29T09:51:52Z","timestamp":1743241912000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2012\/413747"}},"subtitle":[],"editor":[{"given":"Muhammad","family":"Shafique","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2012,1]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,1]]}},"alternative-id":["10.1155\/2012\/413747"],"URL":"https:\/\/doi.org\/10.1155\/2012\/413747","archive":["Portico"],"relation":{},"ISSN":["1065-514X","1563-5171"],"issn-type":[{"type":"print","value":"1065-514X"},{"type":"electronic","value":"1563-5171"}],"subject":[],"published":{"date-parts":[[2012,1]]},"assertion":[{"value":"2011-11-18","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-03-03","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-05-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"413747"}}