{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:16:33Z","timestamp":1750306593691,"version":"3.41.0"},"reference-count":15,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,3,23]],"date-time":"2015-03-23T00:00:00Z","timestamp":1427068800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100007245","name":"Microelectronics Advanced Research Corporation","doi-asserted-by":"publisher","award":["1041388-237984"],"award-info":[{"award-number":["1041388-237984"]}],"id":[{"id":"10.13039\/100007245","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["HR0011-11-C-0007"],"award-info":[{"award-number":["HR0011-11-C-0007"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2015,3,23]]},"abstract":"<jats:p>General-purpose processors, while tremendously versatile, pay a huge cost for their flexibility by wasting over 99% of the energy in programmability overheads. We observe that reducing this waste requires tuning data storage and compute structures and their connectivity to the data-flow and data-locality patterns in the algorithms. Hence, by backing off from full programmability and instead targeting key data-flow patterns used in a domain, we can create efficient engines that can be programmed and reused across a wide range of applications within that domain.<\/jats:p>\n          <jats:p>We present the Convolution Engine (CE)---a programmable processor specialized for the convolution-like data-flow prevalent in computational photography, computer vision, and video processing. The CE achieves energy efficiency by capturing data-reuse patterns, eliminating data transfer overheads, and enabling a large number of operations per memory access. We demonstrate that the CE is within a factor of 2--3\u00d7 of the energy and area efficiency of custom units optimized for a single kernel. The CE improves energy and area efficiency by 8--15\u00d7 over data-parallel Single Instruction Multiple Data (SIMD) engines for most image processing applications.&lt;!-- END_PAGE_1 --&gt;<\/jats:p>","DOI":"10.1145\/2735841","type":"journal-article","created":{"date-parts":[[2015,3,24]],"date-time":"2015-03-24T12:26:59Z","timestamp":1427200019000},"page":"85-93","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":22,"title":["Convolution engine"],"prefix":"10.1145","volume":"58","author":[{"given":"Wajahat","family":"Qadeer","sequence":"first","affiliation":[{"name":"Palo Alto, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rehan","family":"Hameed","sequence":"additional","affiliation":[{"name":"Palo Alto, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ofer","family":"Shacham","sequence":"additional","affiliation":[{"name":"Google, Mountain View, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Preethi","family":"Venkatesan","sequence":"additional","affiliation":[{"name":"Intel Corporation, Santa Clara, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christos","family":"Kozyrakis","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mark","family":"Horowitz","sequence":"additional","affiliation":[{"name":"Stanford University, Stanford, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,3,23]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919648"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2008.1"},{"volume-title":"Color Imaging Array. US Patent Application No. 3971065","year":"1976","author":"Bayer B.","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2006.873163"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/320080.320093"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.848473"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815968"},{"volume-title":"Adaptive Color Plane Interpolation in Single Sensor Color Electronic Camera. US Patent Application No. 5629734","year":"1997","author":"Hamilton J.F.","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485922.2485964"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_2_1_11_1","unstructured":"NVIDIA Inc. Tegra mobile processors. http:\/\/www.nvidia.com\/object\/tegra-4-processor.html.  NVIDIA Inc. Tegra mobile processors. http:\/\/www.nvidia.com\/object\/tegra-4-processor.html."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.81"},{"key":"e_1_2_1_14_1","unstructured":"Tensilica Inc. Tensilica Instruction Extension (TIE) Language Reference Manual.  Tensilica Inc. Tensilica Instruction Extension (TIE) Language Reference Manual."},{"key":"e_1_2_1_15_1","unstructured":"Texas Instruments Inc. OMAP 5 platform. www.ti.com\/omap.  Texas Instruments Inc. OMAP 5 platform. www.ti.com\/omap."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1736020.1736044"}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2735841","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2735841","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:16:36Z","timestamp":1750227396000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2735841"}},"subtitle":["balancing efficiency and flexibility in specialized computing"],"short-title":[],"issued":{"date-parts":[[2015,3,23]]},"references-count":15,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,3,23]]}},"alternative-id":["10.1145\/2735841"],"URL":"https:\/\/doi.org\/10.1145\/2735841","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"type":"print","value":"0001-0782"},{"type":"electronic","value":"1557-7317"}],"subject":[],"published":{"date-parts":[[2015,3,23]]},"assertion":[{"value":"2015-03-23","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}