{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,4]],"date-time":"2025-11-04T10:23:17Z","timestamp":1762251797268,"version":"3.41.0"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"10","license":[{"start":{"date-parts":[[2011,10,1]],"date-time":"2011-10-01T00:00:00Z","timestamp":1317427200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["937060"],"award-info":[{"award-number":["937060"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2011,10]]},"abstract":"<jats:p>Scaling the performance of a power limited processor requires decreasing the energy expended per instruction executed, since energy\/op * op\/second is power. To better understand what improvement in processor efficiency is possible, and what must be done to capture it, we quantify the sources of the performance and energy overheads of a 720p HD H.264 encoder running on a general-purpose four-processor CMP system. The initial overheads are large: the CMP was 500 x less energy efficient than an Application Specific Integrated Circuit (ASIC) doing the same job. We explore methods to eliminate these overheads by transforming the CPU into a specialized system for H.264 encoding. Broadly applicable optimizations like single instruction, multiple data (SIMD) units improve CMP performance by 14 x and energy by 10x, which is still 50x worse than an ASIC. The problem is that the basic operation costs in H.264 are so small that even with a SIMD unit doing over 10 ops per cycle, 90% of the energy is still overhead. Achieving ASIC-like performance and effciency requires algorithm-specifc optimizations. For each subalgorithm of H.264, we create a large, specialized functional\/storage unit capable of executing hundreds of operations per instruction. This improves energy effciency by 160x (instead of 10x), and the final customized CMP reaches the same performance and within 3x of an ASIC solution's energy in comparable area.<\/jats:p>","DOI":"10.1145\/2001269.2001291","type":"journal-article","created":{"date-parts":[[2011,9,23]],"date-time":"2011-09-23T15:12:48Z","timestamp":1316790768000},"page":"85-93","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["Understanding sources of ineffciency in general-purpose chips"],"prefix":"10.1145","volume":"54","author":[{"given":"Rehan","family":"Hameed","sequence":"first","affiliation":[{"name":"Stanford University, Stanford, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wajahat","family":"Qadeer","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Megan","family":"Wachs","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Omid","family":"Azizi","sequence":"additional","affiliation":[{"name":"Hicamp Systems, Menlo Park, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alex","family":"Solomatnikov","sequence":"additional","affiliation":[{"name":"Hicamp Systems, Menlo Park, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Benjamin C.","family":"Lee","sequence":"additional","affiliation":[{"name":"Duke University, Durham, NC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephen","family":"Richardson","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christos","family":"Kozyrakis","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mark","family":"Horowitz","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2011,10]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2008.1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2006.873163"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2005.05.004"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2005.156"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/968280.968307"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CICC.2001.929839"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.1999.752522"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000108"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSID.2007.140"},{"key":"e_1_2_1_10_1","volume-title":"Motion Estimation with Intel Streaming SIMD extensions 4 (Intel SSE4)","author":"Intel Corporation","year":"2008","unstructured":"Intel Corporation . Motion Estimation with Intel Streaming SIMD extensions 4 (Intel SSE4) ( 2008 ). Intel Corporation. Motion Estimation with Intel Streaming SIMD extensions 4 (Intel SSE4) (2008)."},{"volume-title":"Joint Video Team Reference Software JM8.6","year":"2004","key":"e_1_2_1_11_1","unstructured":"ITU-T. Joint Video Team Reference Software JM8.6 ( 2004 ). ITU-T. Joint Video Team Reference Software JM8.6 (2004)."},{"key":"e_1_2_1_12_1","volume-title":"IEEE International Conference on Image Processing ICIP'04","author":"Iverson V.","year":"2004","unstructured":"Iverson , V. , McVeigh , J. , Reese , B. Real -time H. 264\/AVC codec on Intel architectures. In IEEE International Conference on Image Processing ICIP'04 ( 2004 ). Iverson, V., McVeigh, J., Reese, B. Real-time H.264\/AVC codec on Intel architectures. In IEEE International Conference on Image Processing ICIP'04 (2004)."},{"key":"e_1_2_1_13_1","volume-title":"Creating power-effcient application engines for SOC design. SOC Central","author":"Kathail V.","year":"2005","unstructured":"Kathail , V. Creating power-effcient application engines for SOC design. SOC Central ( 2005 ). Kathail, V. Creating power-effcient application engines for SOC design. SOC Central (2005)."},{"volume-title":"Fast integer pel and fractional pel motion estimation for JVT. JVT-F017","year":"2002","key":"e_1_2_1_14_1","unstructured":"MPEG, I., VCEG, I.-T. Fast integer pel and fractional pel motion estimation for JVT. JVT-F017 ( 2002 ). MPEG, I., VCEG, I.-T. Fast integer pel and fractional pel motion estimation for JVT. JVT-F017 (2002)."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/996566.996755"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.81"},{"key":"e_1_2_1_17_1","volume-title":"Visual Communications and Image Processing","author":"Shojania H.","year":"2005","unstructured":"Shojania , H. , Sudharsanan , S. A VLSI architecture for high performance CABAC encoding . In Visual Communications and Image Processing ( 2005 ). Shojania, H., Sudharsanan, S. A VLSI architecture for high performance CABAC encoding. In Visual Communications and Image Processing (2005)."},{"key":"e_1_2_1_18_1","volume-title":"Implementing the advanced encryption standard on Xtensa processors. Application Notes","author":"Tensilica Inc.","year":"2009","unstructured":"Tensilica Inc. Implementing the advanced encryption standard on Xtensa processors. Application Notes ( 2009 ). Tensilica Inc. Implementing the advanced encryption standard on Xtensa processors. Application Notes (2009)."},{"key":"e_1_2_1_19_1","volume-title":"Implementing the fast Fourier transform (FFT). Application Notes","author":"Tensilica Inc.","year":"2005","unstructured":"Tensilica Inc. Implementing the fast Fourier transform (FFT). Application Notes ( 2005 ). Tensilica Inc. Implementing the fast Fourier transform (FFT). Application Notes (2005)."},{"key":"e_1_2_1_20_1","volume-title":"The what, why, and how of confgurable processors","author":"Tensilica Inc.","year":"2008","unstructured":"Tensilica Inc. The what, why, and how of confgurable processors ( 2008 ). Tensilica Inc. The what, why, and how of confgurable processors (2008)."},{"key":"e_1_2_1_21_1","volume-title":"Xtensa LX2 benchmarks","author":"Tensilica Inc.","year":"2005","unstructured":"Tensilica Inc. Xtensa LX2 benchmarks ( 2005 ). Tensilica Inc. Xtensa LX2 benchmarks (2005)."},{"key":"e_1_2_1_22_1","volume-title":"Xtensa processor extensions for data encryption standard (DES). Application Notes","author":"Tensilica Inc.","year":"2008","unstructured":"Tensilica Inc. Xtensa processor extensions for data encryption standard (DES). Application Notes ( 2008 ). Tensilica Inc. Xtensa processor extensions for data encryption standard (DES). Application Notes (2008)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1362622.1362674"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555815.1555773"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of IEEE International Conference on Image Processing","author":"Yin P.","year":"2003","unstructured":"Yin , P. , Tourapis , H.-Y.C. , Tourapis , A.M. , Boyce , J. Fast mode decision and motion estimation for JVT\/h.264 . In Proceedings of IEEE International Conference on Image Processing ( 2003 ). Yin, P., Tourapis, H.-Y.C., Tourapis, A.M., Boyce, J. Fast mode decision and motion estimation for JVT\/h.264. In Proceedings of IEEE International Conference on Image Processing (2003)."}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2001269.2001291","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2001269.2001291","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:59:53Z","timestamp":1750244393000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2001269.2001291"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,10]]},"references-count":25,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2011,10]]}},"alternative-id":["10.1145\/2001269.2001291"],"URL":"https:\/\/doi.org\/10.1145\/2001269.2001291","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"type":"print","value":"0001-0782"},{"type":"electronic","value":"1557-7317"}],"subject":[],"published":{"date-parts":[[2011,10]]},"assertion":[{"value":"2011-10-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}