{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T15:15:25Z","timestamp":1771514125709,"version":"3.50.1"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T00:00:00Z","timestamp":1570752000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100007601","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["688403"],"award-info":[{"award-number":["688403"]}],"id":[{"id":"10.13039\/501100007601","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2019,12,31]]},"abstract":"<jats:p>Iterative Stencil Loops (ISLs) are the key kernel within a range of compute-intensive applications. To accelerate ISLs with Field Programmable Gate Arrays, it is critical to exploit parallelism (1) among elements within the same iteration and (2) across loop iterations. We propose a novel ISL acceleration scheme called Direct Computation of Multiple Iterations (DCMI) that improves upon prior work by pre-computing the effective stencil coefficients after a number of iterations at design time\u2014resulting in accelerators that use minimal on-chip memory and avoid redundant computation. This enables DCMI to improve throughput by up to 7.7\u00d7 compared to the state-of-the-art cone-based architecture.<\/jats:p>","DOI":"10.1145\/3352813","type":"journal-article","created":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T14:53:33Z","timestamp":1570805613000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["DCMI"],"prefix":"10.1145","volume":"16","author":[{"given":"Mostafa","family":"Koraei","sequence":"first","affiliation":[{"name":"University of Tehran, Iran and Norwegian University of Science and Technology, Trondheim, Norway"}]},{"given":"Omid","family":"Fatemi","sequence":"additional","affiliation":[{"name":"University of Tehran, Iran"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9147-5228","authenticated-orcid":false,"given":"Magnus","family":"Jahre","sequence":"additional","affiliation":[{"name":"Norwegian University of Science and Technology, Trondheim, Norway"}]}],"member":"320","published-online":{"date-parts":[[2019,10,11]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2006.73"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851497"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941487.1941507"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2842615"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the International Symposium on Microarchitecture (MICRO). 1--13","author":"Caulfield A. M.","unstructured":"A. M. Caulfield , E. S. Chung , A. Putnam , H. Angepat , J. Fowers , M. Haselman , S. Heil , M. Humphrey , P. Kaur , J. Y. Kim , D. Lo , T. Massengill , K. Ovtcharov , M. Papamichael , L. Woods , S. Lanka , D. Chiou , and D. Burger . 2016. A cloud-scale acceleration architecture . In Proceedings of the International Symposium on Microarchitecture (MICRO). 1--13 . A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Y. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. 2016. A cloud-scale acceleration architecture. In Proceedings of the International Symposium on Microarchitecture (MICRO). 1--13."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240765.3240850"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the Design Automation Conference (DAC\u201914)","author":"Cong Jason","year":"2014","unstructured":"Jason Cong , Peng Li , Bingjun Xiao , and Peng Zhang . 2014 . An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers . In Proceedings of the Design Automation Conference (DAC\u201914) . 77:1--77:6. Jason Cong, Peng Li, Bingjun Xiao, and Peng Zhang. 2014. An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers. In Proceedings of the Design Automation Conference (DAC\u201914). 77:1--77:6."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5222004"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201917)","author":"Deest G.","unstructured":"G. Deest , T. Yuki , S. Rajopadhye , and S. Derrien . 2017. One size does not fit all: Implementation trade-offs for iterative stencil computations on FPGAs . In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201917) . 1--8. G. Deest, T. Yuki, S. Rajopadhye, and S. Derrien. 2017. One size does not fit all: Implementation trade-offs for iterative stencil computations on FPGAs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201917). 1--8."},{"key":"e_1_2_1_10_1","volume-title":"International Conference on Reconfigurable Computing and FPGAs (ReConFig\u201913)","author":"Dohi K.","unstructured":"K. Dohi , K. Fukumoto , Y. Shibata , and K. Oguri . 2013. Performance modeling and optimization of 3-D stencil computation on a stream-based FPGA accelerator . In International Conference on Reconfigurable Computing and FPGAs (ReConFig\u201913) . 1--6. K. Dohi, K. Fukumoto, Y. Shibata, and K. Oguri. 2013. Performance modeling and optimization of 3-D stencil computation on a stream-based FPGA accelerator. In International Conference on Reconfigurable Computing and FPGAs (ReConFig\u201913). 1--6."},{"key":"e_1_2_1_11_1","unstructured":"B. A. Draper J. R. Beveridge A. P. W. Bohm C. Ross and M. Chawathe. 2002. Implementing image applications on FPGAs. In Object Recognition Supported by User Interaction for Service Robots Vol. 3. 265--268.  B. A. Draper J. R. Beveridge A. P. W. Bohm C. Ross and M. Chawathe. 2002. Implementing image applications on FPGAs. In Object Recognition Supported by User Interaction for Service Robots Vol. 3. 265--268."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/LAWP.2003.812245"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174251"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA\u201911)","author":"Fu Haohuan","unstructured":"Haohuan Fu and Robert G. Clapp . 2011. Eliminating the memory bottleneck: An FPGA-based solution for 3D reverse time migration . In Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA\u201911) . 65--74. Haohuan Fu and Robert G. Clapp. 2011. Eliminating the memory bottleneck: An FPGA-based solution for 3D reverse time migration. In Proceedings of the International Symposium on Field Programmable Gate Arrays (FPGA\u201911). 65--74."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815968"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the International Conference on Supercomputing (SC\u201912)","author":"Holewinski Justin","unstructured":"Justin Holewinski , Louis-No\u00ebl Pouchet , and P. Sadayappan . 2012. High-performance code generation for stencil computations on GPU architectures . In Proceedings of the International Conference on Supercomputing (SC\u201912) . 311--320. Justin Holewinski, Louis-No\u00ebl Pouchet, and P. Sadayappan. 2012. High-performance code generation for stencil computations on GPU architectures. In Proceedings of the International Conference on Supercomputing (SC\u201912). 311--320."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the International Conference on Field-Programmable Technology (FPT\u201911)","author":"Hussain T.","unstructured":"T. Hussain , M. Peric\u00e0s , N. Navarro , and E. Ayguad\u00e9 . 2011. Implementation of a reverse time migration kernel using the HCE high level synthesis tool . In Proceedings of the International Conference on Field-Programmable Technology (FPT\u201911) . 1--8. T. Hussain, M. Peric\u00e0s, N. Navarro, and E. Ayguad\u00e9. 2011. Implementation of a reverse time migration kernel using the HCE high level synthesis tool. In Proceedings of the International Conference on Field-Programmable Technology (FPT\u201911). 1--8."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/SBAC-PAD.2016.18"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2800788"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037749"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the International Conference on Networking and Computing (ICNC\u201912)","author":"Kobayashi R.","unstructured":"R. Kobayashi , S. Takamaeda-Yamazaki , and K. Kise . 2012. Towards a low-power accelerator of many FPGAs for stencil computations . In Proceedings of the International Conference on Networking and Computing (ICNC\u201912) . 343--349. R. Kobayashi, S. Takamaeda-Yamazaki, and K. Kise. 2012. Towards a low-power accelerator of many FPGAs for stencil computations. In Proceedings of the International Conference on Networking and Computing (ICNC\u201912). 343--349."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN\u201997)","author":"Koseki A.","unstructured":"A. Koseki , H. Komastu , and Y. Fukazawa . 1997. A method for estimating optimal unrolling times for nested loops . In Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN\u201997) . 376--382. A. Koseki, H. Komastu, and Y. Fukazawa. 1997. A method for estimating optimal unrolling times for nested loops. In Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN\u201997). 376--382."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2006.884574"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1034774.1034777"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542313"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP\u201905)","volume":"3","author":"Motuk E.","unstructured":"E. Motuk , R. Woods , and S. Bilbao . 2005. Implementation of finite difference schemes for the wave equation on FPGA . In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP\u201905) , Vol. 3 . 237--240. E. Motuk, R. Woods, and S. Bilbao. 2005. Implementation of finite difference schemes for the wave equation on FPGA. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP\u201905), Vol. 3. 237--240."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2007.898785"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/125826.126004"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463209.2488797"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201916)","author":"Nagasu K.","unstructured":"K. Nagasu , K. Sano , F. Kono , N. Nakasato , A. Vazhenin , and S. Sedukhin . 2016. Parallelism for high-performance tsunami simulation with FPGA: Spatial or temporal? . In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201916) . 30--30. K. Nagasu, K. Sano, F. Kono, N. Nakasato, A. Vazhenin, and S. Sedukhin. 2016. Parallelism for high-performance tsunami simulation with FPGA: Spatial or temporal?. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201916). 30--30."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201916)","author":"Natale G.","unstructured":"G. Natale , G. Stramondo , P. Bressana , R. Cattaneo , D. Sciuto , and M. D. Santambrogio . 2016. A polyhedral model-based framework for dataflow implementation on FPGA devices of iterative stencil loops . In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201916) . 1--8. G. Natale, G. Stramondo, P. Bressana, R. Cattaneo, D. Sciuto, and M. D. Santambrogio. 2016. A polyhedral model-based framework for dataflow implementation on FPGA devices of iterative stencil loops. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201916). 1--8."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2927964.2927967"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the Hot Chips Symposium (HCS\u201914)","author":"Ouyang Jian","year":"2014","unstructured":"Jian Ouyang , Shiding Lin , Wei Qi , Yong Wang , Bo Yu , and Song Jiang . 2014 . SDA: Software-defined accelerator for large-scale DNN systems . In Proceedings of the Hot Chips Symposium (HCS\u201914) . 1--23. Jian Ouyang, Shiding Lin, Wei Qi, Yong Wang, Bo Yu, and Song Jiang. 2014. SDA: Software-defined accelerator for large-scale DNN systems. In Proceedings of the Hot Chips Symposium (HCS\u201914). 1--23."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201918)","author":"Panda R.","unstructured":"R. Panda , S. Song , J. Dean , and L. K. John . 2018. Wait of a decade: Did SPEC CPU 2017 broaden the performance horizon? In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201918) . 271--282. R. Panda, S. Song, J. Dean, and L. K. John. 2018. Wait of a decade: Did SPEC CPU 2017 broaden the performance horizon? In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA\u201918). 271--282."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2014.6853195"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201913)","author":"Qadeer Wajahat","unstructured":"Wajahat Qadeer , Rehan Hameed , Ofer Shacham , Preethi Venkatesan , Christos Kozyrakis , and Mark A. Horowitz . 2013. Convolution engine: Balancing efficiency 8 flexibility in specialized computing . In Proceedings of the International Symposium on Computer Architecture (ISCA\u201913) . 24--35. Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark A. Horowitz. 2013. Convolution engine: Balancing efficiency 8 flexibility in specialized computing. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201913). 24--35."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201917)","author":"Rabozzi M.","unstructured":"M. Rabozzi , G. Natale , B. Festa , A. Miele , and M. D. Santambrogio . 2017. Optimizing streaming stencil time-step designs via FPGA floorplanning . In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201917) . 1--4. M. Rabozzi, G. Natale, B. Festa, A. Miele, and M. D. Santambrogio. 2017. Optimizing streaming stencil time-step designs via FPGA floorplanning. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201917). 1--4."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3150211"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the International Conference on Computing Frontiers (CF\u201911)","author":"Faizur Rahman Shah M.","year":"2011","unstructured":"Shah M. Faizur Rahman , Qing Yi , and Apan Qasem . 2011 . Understanding stencil code performance on multicore architectures . In Proceedings of the International Conference on Computing Frontiers (CF\u201911) . 1--10. Shah M. Faizur Rahman, Qing Yi, and Apan Qasem. 2011. Understanding stencil code performance on multicore architectures. In Proceedings of the International Conference on Computing Frontiers (CF\u201911). 1--10."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2016.2545408"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201900)","author":"Rixner Scott","unstructured":"Scott Rixner , William J. Dally , Ujval J. Kapasi , Peter Mattson , and John D. Owens . 2000. Memory access scheduling . In Proceedings of the International Symposium on Computer Architecture (ISCA\u201900) . 128--138. Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. 2000. Memory access scheduling. In Proceedings of the International Symposium on Computer Architecture (ISCA\u201900). 128--138."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201911)","author":"Sano K.","unstructured":"K. Sano , Y. Hatsuda , and S. Yamamoto . 2011. Scalable streaming-array of simple soft-processors for stencil computations with constant memory-bandwidth . In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201911) . 234--241. K. Sano, Y. Hatsuda, and S. Yamamoto. 2011. Scalable streaming-array of simple soft-processors for stencil computations with constant memory-bandwidth. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201911). 234--241."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.51"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2082156.2082168"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/335231.335246"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the International Symposium on Object\/Component\/Service-Oriented Real-Time Distributed Computing Workshops. 180--187","author":"Schmidt M.","unstructured":"M. Schmidt , M. Reichenbach , and D. Fey . 2012. A generic VHDL template for 2D stencil code applications on FPGAs . In Proceedings of the International Symposium on Object\/Component\/Service-Oriented Real-Time Distributed Computing Workshops. 180--187 . M. Schmidt, M. Reichenbach, and D. Fey. 2012. A generic VHDL template for 2D stencil code applications on FPGAs. In Proceedings of the International Symposium on Object\/Component\/Service-Oriented Real-Time Distributed Computing Workshops. 180--187."},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the International Conference on Field-Programmable Technology (FPT\u201909)","author":"Shafiq M.","unstructured":"M. Shafiq , M. Peric\u00e0s , R. de la Cruz, M. Araya-Polo, N. Navarro, and E. Ayguad\u00e9. 2009. Exploiting memory customization in FPGA for 3D stencil computations . In Proceedings of the International Conference on Field-Programmable Technology (FPT\u201909) . 38--45. M. Shafiq, M. Peric\u00e0s, R. de la Cruz, M. Araya-Polo, N. Navarro, and E. Ayguad\u00e9. 2009. Exploiting memory customization in FPGA for 3D stencil computations. In Proceedings of the International Conference on Field-Programmable Technology (FPT\u201909). 38--45."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/357401.357403"},{"key":"e_1_2_1_49_1","volume-title":"SPEC CPU 2017","author":"SPEC.","year":"2017","unstructured":"SPEC. 2017 . SPEC CPU 2017 . Retrieved from https:\/\/www.spec.org\/cpu 2017\/. SPEC. 2017. SPEC CPU 2017. Retrieved from https:\/\/www.spec.org\/cpu2017\/."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2584665"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA\u201915)","author":"Takei Yasuhiro","year":"2015","unstructured":"Yasuhiro Takei , Hasitha Muthumala Waidyasooriya , Masanori Hariyama , and Michitaka Kameyama . 2015 . FPGA-oriented design of an FDTD accelerator based on overlapped tiling . In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA\u201915) . 72--77. Yasuhiro Takei, Hasitha Muthumala Waidyasooriya, Masanori Hariyama, and Michitaka Kameyama. 2015. FPGA-oriented design of an FDTD accelerator based on overlapped tiling. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA\u201915). 72--77."},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the International Conference on Computer and Information Science (ICIS\u201916)","author":"Waidyasooriya H. M.","unstructured":"H. M. Waidyasooriya and M. Hariyama . 2016. FPGA-based deep-pipelined architecture for FDTD acceleration using OpenCL . In Proceedings of the International Conference on Computer and Information Science (ICIS\u201916) . 1--6. H. M. Waidyasooriya and M. Hariyama. 2016. FPGA-based deep-pipelined architecture for FDTD acceleration using OpenCL. In Proceedings of the International Conference on Computer and Information Science (ICIS\u201916). 1--6."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2016.2614981"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the Design Automation Conference (DAC\u201917)","author":"Wang S.","unstructured":"S. Wang and Y. Liang . 2017. A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model . In Proceedings of the Design Automation Conference (DAC\u201917) . 1--6. S. Wang and Y. Liang. 2017. A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model. In Proceedings of the Design Automation Conference (DAC\u201917). 1--6."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1015460304860"},{"key":"e_1_2_1_56_1","unstructured":"Xilinx. 2010. Memory interface solutions. Retrieved from https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/ug086.pdf.  Xilinx. 2010. Memory interface solutions. Retrieved from https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/ug086.pdf."},{"key":"e_1_2_1_57_1","unstructured":"Xilinx. 2012. 7 series FPGAs memory interface solutions. Retrieved from https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/mig_7series\/v1_4\/ug586_7Series_MIS.pdf.  Xilinx. 2012. 7 series FPGAs memory interface solutions. Retrieved from https:\/\/www.xilinx.com\/support\/documentation\/ip_documentation\/mig_7series\/v1_4\/ug586_7Series_MIS.pdf."},{"key":"e_1_2_1_58_1","unstructured":"Xilinx. 2018. Vivado high-level synthesis. Retrieved from https:\/\/www.xilinx.com\/products\/design-tools\/vivado\/integration\/esl-design.html.  Xilinx. 2018. Vivado high-level synthesis. Retrieved from https:\/\/www.xilinx.com\/products\/design-tools\/vivado\/integration\/esl-design.html."},{"key":"e_1_2_1_59_1","unstructured":"Xilinx. 2019. Retrieved from VC709 evaluation board for the Virtex-7 FPGA -- User guide. https:\/\/www.xilinx.com\/support\/documentation\/boards_and_kits\/vc709\/ug887-vc709-eval-board-v7-fpga.pdf.  Xilinx. 2019. Retrieved from VC709 evaluation board for the Virtex-7 FPGA -- User guide. https:\/\/www.xilinx.com\/support\/documentation\/boards_and_kits\/vc709\/ug887-vc709-eval-board-v7-fpga.pdf."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3174243.3174248"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPDC.2012.17"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3352813","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3352813","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:37Z","timestamp":1750268977000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3352813"}},"subtitle":["A Scalable Strategy for Accelerating Iterative Stencil Loops on FPGAs"],"short-title":[],"issued":{"date-parts":[[2019,10,11]]},"references-count":61,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,12,31]]}},"alternative-id":["10.1145\/3352813"],"URL":"https:\/\/doi.org\/10.1145\/3352813","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,11]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}