{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:26:53Z","timestamp":1750307213853,"version":"3.41.0"},"reference-count":12,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2011,12,19]],"date-time":"2011-12-19T00:00:00Z","timestamp":1324252800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2011,12,19]]},"abstract":"<jats:p>This paper presents the domain-specific programmable design of custom computing machines for high-performance stencil computation. Stencil computation is one of the typical kernels in scientific computations, however its low operational-intensity makes the sustained performance limited by memory bandwidth on recent microprocessors and GPUs. So far we have proposed a scalable streaming-array (SSA) of processing elements, which provides almost linear scalability by increasing FPGAs with a constant externalmemory bandwidth. In order to facilitate custom computing and efficiently utilize hardware resources for various and complex stencil-computations, we design programmable SSA with limited but necessary functionality. We show the design concept, the programmable structure and the SIMD instruction set for SSA. Prototype implementation with nine FPGAs demonstrates that our programmable design with a lot of floating-point units exploits hardware resources well, efficiently achieving 260 GFlop\/s, which is 87.4% of the peak, at 1295 MFlop\/sW.<\/jats:p>","DOI":"10.1145\/2082156.2082168","type":"journal-article","created":{"date-parts":[[2011,12,27]],"date-time":"2011-12-27T15:22:22Z","timestamp":1324999342000},"page":"44-49","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Domain-specific programmable design of scalable streaming-array for power-efficient stencil computation"],"prefix":"10.1145","volume":"39","author":[{"given":"Kentaro","family":"Sano","sequence":"first","affiliation":[{"name":"Sciences, Tohoku University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Satoru","family":"Yamamoto","sequence":"additional","affiliation":[{"name":"Sciences, Tohoku University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yoshiaki","family":"Hatsuda","sequence":"additional","affiliation":[{"name":"Kobo, Co., Ltd."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2011,12,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03869-3_72"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1413370.1413375"},{"key":"e_1_2_1_3_1","volume-title":"Applied Iterative Methods","author":"Hageman L. A.","year":"1981","unstructured":"L. A. Hageman and D. M. Young . Applied Iterative Methods . Academic Press , 1981 . L. A. Hageman and D. M. Young. Applied Iterative Methods. Academic Press, 1981."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1058426.1058874"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/180\/1\/012043"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470394"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2011.12"},{"key":"e_1_2_1_8_1","unstructured":"Terasic Technologies. http:\/\/www.terasic.com.  Terasic Technologies. http:\/\/www.terasic.com."},{"key":"e_1_2_1_9_1","unstructured":"The Green500 List. http:\/\/www.green500.org.  The Green500 List. http:\/\/www.green500.org."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMPSAC.2009.82"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_2_1_12_1","volume-title":"Fall","author":"Wolfram S.","year":"1983","unstructured":"S. Wolfram . Cellular automata. Los Alamos Science, (9) , Fall 1983 . S. Wolfram. Cellular automata. Los Alamos Science, (9), Fall 1983."}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2082156.2082168","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2082156.2082168","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:06:42Z","timestamp":1750241202000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2082156.2082168"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,12,19]]},"references-count":12,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2011,12,19]]}},"alternative-id":["10.1145\/2082156.2082168"],"URL":"https:\/\/doi.org\/10.1145\/2082156.2082168","relation":{},"ISSN":["0163-5964"],"issn-type":[{"type":"print","value":"0163-5964"}],"subject":[],"published":{"date-parts":[[2011,12,19]]},"assertion":[{"value":"2011-12-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}