{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T11:36:24Z","timestamp":1778067384792,"version":"3.51.4"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,4,30]],"date-time":"2024-04-30T00:00:00Z","timestamp":1714435200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Japan Society for the Promotion of Science (JSPS) KAKENHI","award":["JP20H00593"],"award-info":[{"award-number":["JP20H00593"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>\n            Stencil-based applications play an essential role in high-performance systems as they occur in numerous computational areas, such as partial differential equation solving. In this context,\n            <jats:bold>Iterative Stencil Loops (ISLs)<\/jats:bold>\n            represent a prominent and well-known algorithmic class within the stencil domain. Specifically, ISL-based calculations iteratively apply the same stencil to a multi-dimensional point grid multiple times or until convergence. However, due to their iterative and intensive nature, ISLs are highly performance-hungry, demanding specialized solutions. Here,\n            <jats:bold>Field Programmable Gate Arrays (FPGAs)<\/jats:bold>\n            represent a valid architectural choice as they enable the design of custom, parallel, and scalable ISL accelerators. Besides, the regular structure of ISLs makes them an ideal candidate for automatic optimization and generation flows. For these reasons, this article introduces\n            <jats:sc>Senju<\/jats:sc>\n            , an automation framework for the design of highly parallel ISL accelerators targeting single-\/multi-FPGA systems. Given an input description,\n            <jats:sc>Senju<\/jats:sc>\n            automates the entire design process and provides accurate performance estimations. The experimental evaluation shows remarkable and scalable results, outperforming single- and multi-FPGA literature approaches under different metrics. Finally, we present a new analysis of temporal and spatial parallelism trade-offs in a real-case scenario and discuss our performance through a single- and novel specialized multi-FPGA formulation of the Roofline Model.\n          <\/jats:p>","DOI":"10.1145\/3634920","type":"journal-article","created":{"date-parts":[[2023,11,29]],"date-time":"2023-11-29T12:01:17Z","timestamp":1701259277000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Across Time and Space:\n            <scp>Senju<\/scp>\n            \u2019s Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3101-8118","authenticated-orcid":false,"given":"Emanuele","family":"Del Sozzo","sequence":"first","affiliation":[{"name":"RIKEN Center for Computational Science, Kobe, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5834-0812","authenticated-orcid":false,"given":"Davide","family":"Conficconi","sequence":"additional","affiliation":[{"name":"Politecnico di Milano, Milan, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6681-4192","authenticated-orcid":false,"given":"Kentaro","family":"Sano","sequence":"additional","affiliation":[{"name":"RIKEN Center for Computational Science, Kobe, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,4,30]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"AMD. 2023. Versal Premium Series VPK120 Evaluation Kit. Retrieved from https:\/\/www.xilinx.com\/products\/boards-and-kits\/vpk120.html"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.107"},{"key":"e_1_3_2_4_2","article-title":"PLUTO Compiler - Examples","author":"Bondhugula Uday","year":"2008","unstructured":"Uday Bondhugula. 2008. PLUTO Compiler - Examples. Retrieved from https:\/\/github.com\/bondhugula\/pluto\/tree\/master\/examples","journal-title":"Retrieved from"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2012.04.017"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/2842615"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783710"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240850"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.70"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TETC.2022.3157948"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2593090"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-16-7487-7_14"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2022.3218898"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSI.2018.8548388"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3543622.3573170"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3532989"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2005.09.021"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-007-0111-y"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/InPar.2012.6339595"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/573190"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304619"},{"key":"e_1_3_2_23_2","unstructured":"Intel. 2017. Open Programmable Acceleration Engine. Retrieved from https:\/\/opae.github.io\/latest\/index.html#user-docs"},{"key":"e_1_3_2_24_2","unstructured":"Intel. 2019. Intel FPGA PAC D5005. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/products\/sku\/193921\/intel-fpga-pac-d5005\/specifications.html"},{"key":"e_1_3_2_25_2","unstructured":"Intel. 2021. Intel\u00ae HLS Compiler Pro Edition Reference Manual. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/programmable\/683349\/21-4\/pro-edition-reference-manual.html"},{"key":"e_1_3_2_26_2","unstructured":"Intel. 2022. Avalon Interfaces. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/programmable\/683091\/22-3\/introduction-to-the-interface-specifications.html"},{"key":"e_1_3_2_27_2","unstructured":"Intel. 2022. FPGA Interface Manager. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/programmable\/683193\/current\/fim.html"},{"key":"e_1_3_2_28_2","unstructured":"Intel. 2022. Intel FPGA Acceleration Card Solutions. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/products\/details\/fpga\/platforms\/pac.html"},{"key":"e_1_3_2_29_2","unstructured":"Intel. 2022. Intel High-Level Synthesis Compiler. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/software\/programmable\/quartus-prime\/hls-compiler.html"},{"key":"e_1_3_2_30_2","unstructured":"Intel. 2022. Intel Quartus Prime. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/products\/details\/fpga\/development-tools\/quartus-prime.html"},{"key":"e_1_3_2_31_2","unstructured":"Intel. 2022. Intel Stratix 10 GX. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/products\/details\/fpga\/stratix\/10\/gx.html"},{"key":"e_1_3_2_32_2","unstructured":"Intel. 2023. Intel Agilex\u00ae 7 FPGA I-Series Development Kit. Retrieved from https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/235244\/intel-agilex-7-fpga-iseries-development-kit-es2-2x-rtile-1x-ftile.html"},{"key":"e_1_3_2_33_2","unstructured":"Intel. 2023. Logic Array Blocks and Adaptive Logic Modules in Intel\u00ae Arria\u00ae 10 Devices. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/programmable\/683461\/current\/logic-array-blocks-and-adaptive-logic-05488.html"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1137\/07070574X"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45234-8_73"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1029\/96JC02775"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542313"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ReCoSoC.2017.8016148"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2463209.2488797"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-59334-5_7"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2966986.2966995"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICESS.2019.8782524"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3461478"},{"key":"e_1_3_2_44_2","first-page":"1","volume-title":"Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms","author":"Richter Franz","year":"2012","unstructured":"Franz Richter, Michael Schmidt, and Dietmar Fey. 2012. A configurable VHDL template for parallelization of 3D stencil codes on FPGAs. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 1\u20138."},{"issue":"2","key":"e_1_3_2_45_2","first-page":"164","article-title":"Generalized jacobi and gauss-seidel methods for solving linear system of equations","volume":"16","author":"Salkuyeh Davod Khojasteh","year":"2007","unstructured":"Davod Khojasteh Salkuyeh. 2007. Generalized jacobi and gauss-seidel methods for solving linear system of equations. NUMERICAL MATHEMATICS-ENGLISH SERIES- 16, 2 (2007), 164.","journal-title":"NUMERICAL MATHEMATICS-ENGLISH SERIES-"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.51"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3578178.3579341"},{"key":"e_1_3_2_48_2","volume-title":"Computer Vision","author":"Shapiro Linda G.","year":"2001","unstructured":"Linda G. Shapiro, George C. Stockman, et\u00a0al. 2001. Computer Vision. Vol. 3. Prentice Hall New Jersey."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2021.3111761"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3400302.3415730"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1137\/S0036144599363084"},{"key":"e_1_3_2_52_2","volume-title":"Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing","author":"Stratton John A.","year":"2012","unstructured":"John A. Stratton, Christopher Rodrigrues, I-Jui Sung, Nady Obeid, Liwen Chang, Geng Liu, and Wen-Mei W. Hwu. 2012. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing. In Proceedings of the Technical Report IMPACT-12-01. University of Illinois at Urbana-Champaign, Urbana."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/1989493.1989508"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2014.2386883"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3572547"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT47387.2019.00068"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","unstructured":"Hasitha Muthumala Waidyasooriya and Masanori Hariyama. 2019. Multi-FPGA accelerator architecture for stencil computation exploiting spacial and temporal scalability. IEEE Access 7 (2019) 53188\u201353201. DOI:10.1109\/ACCESS.2019.2910824","DOI":"10.1109\/ACCESS.2019.2910824"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2614981"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062185"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01217347"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3634920","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3634920","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:51:07Z","timestamp":1750287067000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3634920"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,30]]},"references-count":60,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3634920"],"URL":"https:\/\/doi.org\/10.1145\/3634920","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,30]]},"assertion":[{"value":"2023-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-16","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}