{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:28:19Z","timestamp":1750307299429,"version":"3.41.0"},"reference-count":19,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2010,9,14]],"date-time":"2010-09-14T00:00:00Z","timestamp":1284422400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2010,9,14]]},"abstract":"<jats:p>This paper demonstrates and evaluates the performance and the scalability of the systolic computational-memory array (SCMA) for stencil computation, which is a typical computing kernel of scientific simulation. We describe the basic architecture of th SCMA, and show the requirements and the design of SCMAs to scalably operate over multiple devices. We implement a prototype of the SCMA with three ALTERA Stratix III FPGAs, which form a 1--3 FPGA array by conecting three DE3 boards with different clock sources. The prototype SCMA demonstrates that the difference in operating clock frequency hardly influences the total execution cycles while it slightly causes stall cycles to sub-SCMAs on different FPGAs. With three banchmark programs of typical computing kernels based on the finite difference method, we show that the increased FPGAs provide higher performance proportional to the number of devices, resulting in almost linear speedup.<\/jats:p>","DOI":"10.1145\/1926367.1926381","type":"journal-article","created":{"date-parts":[[2011,1,24]],"date-time":"2011-01-24T14:58:13Z","timestamp":1295881093000},"page":"80-86","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Prototype implementation of array-processor extensible over multiple FPGAs for scalable stencil computation"],"prefix":"10.1145","volume":"38","author":[{"given":"Kentaro","family":"Sano","sequence":"first","affiliation":[{"name":"Tohoku University, Sendai, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Luzhou","family":"Wang","sequence":"additional","affiliation":[{"name":"Tohoku University, Sendai, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Satoru","family":"Yamamoto","sequence":"additional","affiliation":[{"name":"Tohoku University, Sendai, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2011,1,14]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/968280.968311"},{"key":"e_1_2_1_2_1","first-page":"1","volume-title":"Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing","author":"Datta K.","year":"2009","unstructured":"K. Datta , M. Murphy , V. Volkov , S. Williams , J. Carter , L. Oliker , D. Patterson , J. Shalf , and K. Yelick . Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures . Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing , pages 1 -- 12 , 2009 . K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker,D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing, pages 1--12, 2009."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025123.1025825"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/54.748803"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/AHS.2007.71"},{"volume-title":"WA","year":"2009","key":"e_1_2_1_6_1","unstructured":"e. a. J. D. Davis. Bee3: Revitalizing computer architecture research. MSR-TR-2009-45 Microsoft Research Redmond , WA , 2009 . e. a. J. D. Davis. Bee3: Revitalizing computer architecture research. MSR-TR-2009-45 Microsoft Research Redmond, WA, 2009."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.241423"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1982.1653825"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2008.4762383"},{"key":"e_1_2_1_10_1","volume-title":"IEEE Southern Programable Logic Congerence 2009 Proceedings","author":"Mencer O.","year":"2009","unstructured":"e. a. O. Mencer . Cube : A 512-fpga cluster . IEEE Southern Programable Logic Congerence 2009 Proceedings , 2009 . e. a. O. Mencer. Cube: A 512-fpga cluster. IEEE Southern Programable Logic Congerence 2009 Proceedings, 2009."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/40.592312"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/846213.846522"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2007.61"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1862648.1862651"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the International Workshop on High-Performance Reconfigurable Computing Technology and Applications(HPRCTA'08)","author":"Sano K.","year":"2008","unstructured":"K. Sano , W. Luzhou , Y. Hatsuda , and S. Yamamoto . Scalable FPGA-array for high-performance and power-efficient computation based on difference schemes . Proceedings of the International Workshop on High-Performance Reconfigurable Computing Technology and Applications(HPRCTA'08) , November 2008 . DOI: 10.1109\/HPRCTA.2008.4745679. 10.1109\/HPRCTA.2008.4745679 K. Sano, W. Luzhou, Y. Hatsuda, and S. Yamamoto. Scalable FPGA-array for high-performance and power-efficient computation based on difference schemes. Proceedings of the International Workshop on High-Performance Reconfigurable Computing Technology and Applications(HPRCTA'08), November 2008. DOI: 10.1109\/HPRCTA.2008.4745679."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:SUPE.0000045211.07895.cb"},{"key":"e_1_2_1_17_1","volume-title":"http:\/\/www.terasic.com.tw\/en\/","author":"Technologies Terasic","year":"2010","unstructured":"Terasic Technologies . http:\/\/www.terasic.com.tw\/en\/ . 2010 . Terasic Technologies. http:\/\/www.terasic.com.tw\/en\/. 2010."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/92.486081"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1926367.1926381","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1926367.1926381","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:59:51Z","timestamp":1750244391000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1926367.1926381"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,9,14]]},"references-count":19,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2010,9,14]]}},"alternative-id":["10.1145\/1926367.1926381"],"URL":"https:\/\/doi.org\/10.1145\/1926367.1926381","relation":{},"ISSN":["0163-5964"],"issn-type":[{"type":"print","value":"0163-5964"}],"subject":[],"published":{"date-parts":[[2010,9,14]]},"assertion":[{"value":"2011-01-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}