{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:16:14Z","timestamp":1750306574755,"version":"3.41.0"},"reference-count":9,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2014,12,3]],"date-time":"2014-12-03T00:00:00Z","timestamp":1417564800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2014,12,3]]},"abstract":"<jats:p>In this paper, we propose a framework to assist memory access optimization for stencil computation on an FPGA accelerator. Since the stencil computations such as scientific simulations need large amounts of data, efficient memory access is a key to achieving high performance on FPGA accelerators. Therefore, we implemented a stencil computation framework with a memory performance profiler on MaxCompiler, which is one of high level synthesis systems. The memory profiler enables us to measure clock cycles for various memory controller states; data transfer, stall, and idle. We also implemented simple stencil computations and practical FDTD electromagnetic field simulations on top of the framework with various parameters to evaluate and analyze memory performance. As a result of execution experiments of the simple stencil computations on a MAX34245A Data Flow Engine, it was demonstrated that approximately 70% of the peak memory performance could be achieved for various stencil types. On the other hand, the FDTD simulations, which need many data streams, could not hit this memory performance saturation point, because of increasing complexity of memory controller modules. Through the analysis of evaluation results obtained by our memory performance profiling frame- work, a promising memory access optimization approach for stencil computations in which the complexity of the memory controller is traded off against data access traffic is suggested.<\/jats:p>","DOI":"10.1145\/2693714.2693727","type":"journal-article","created":{"date-parts":[[2014,12,8]],"date-time":"2014-12-08T16:17:14Z","timestamp":1418055434000},"page":"69-74","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["A Memory Profiling Framework for Stencil Computation on an FPGA Accelerator with High Level Synthesis"],"prefix":"10.1145","volume":"42","author":[{"given":"Rie","family":"Soejima","sequence":"first","affiliation":[{"name":"Nagasaki University, Japan"}]},{"given":"Koji","family":"Okina","sequence":"additional","affiliation":[{"name":"Nagasaki University, Japan"}]},{"given":"Keisuke","family":"Dohi","sequence":"additional","affiliation":[{"name":"Nagasaki University, Japan"}]},{"given":"Yuichiro","family":"Shibata","sequence":"additional","affiliation":[{"name":"Nagasaki University, Japan"}]},{"given":"Kiyoshi","family":"Oguri","sequence":"additional","affiliation":[{"name":"Nagasaki University, Japan"}]}],"member":"320","published-online":{"date-parts":[[2014,12,3]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Maxeler Technologies \"Multiscale Dataflow Programming Version 2013.3 \" Dec 2013. Figure 9: Memory profiling of FDTD with stream optimization  Maxeler Technologies \"Multiscale Dataflow Programming Version 2013.3 \" Dec 2013. Figure 9: Memory profiling of FDTD with stream optimization"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"K. Sano \"Fpga-based systolic computational-memory array for scalable stencil computations \" in High-Performance Computing Using FPGAs pp. 279--303 2013.  K. Sano \"Fpga-based systolic computational-memory array for scalable stencil computations \" in High-Performance Computing Using FPGAs pp. 279--303 2013.","DOI":"10.1007\/978-1-4614-1791-0_9"},{"issue":"8","key":"e_1_2_1_3_1","first-page":"1676","volume":"96","author":"Dohi K.","year":"2013","unstructured":"K. Dohi , K. Negi , Y. Shibata , and K. Oguri , \"FPGA Implementation of Human Detection by HOG Features with AdaBoost,\" vol. 96 , no. 8 , pp. 1676 -- 1684 , 2013 . K. Dohi, K. Negi, Y. Shibata, and K. Oguri, \"FPGA Implementation of Human Detection by HOG Features with AdaBoost,\" vol. 96, no. 8, pp. 1676--1684, 2013.","journal-title":"\"FPGA Implementation of Human Detection by HOG Features with AdaBoost,\""},{"key":"e_1_2_1_4_1","first-page":"458","volume-title":"2012 22nd International Conference on","author":"Dohi K.","year":"2012","unstructured":"K. Dohi , Y. Hatanaka , K. Negi , Y. Shibata , and K. Oguri , \" Deep-pipelined fpga implementation of ellipse estimation for eye tracking,\" in Field Programmable Logic and Applications (FPL) , 2012 22nd International Conference on , pp. 458 -- 463 , 2012 . K. Dohi, Y. Hatanaka, K. Negi, Y. Shibata, and K. Oguri, \"Deep-pipelined fpga implementation of ellipse estimation for eye tracking,\" in Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on, pp. 458--463, 2012."},{"volume-title":"SC '08","author":"Datta K.","key":"e_1_2_1_5_1","unstructured":"K. Datta , M. Murphy , V. Volkov , S. Williams , J. Carter , L. Oliker , D. Patterson , J. Shalf , and K. Yelick , \" Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures,\" in Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing , SC '08 , (Piscataway, NJ, USA), pp. 4:1--4:12, 2008. K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick, \"Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures,\" in Proceedings of the 2008 ACM\/IEEE Conference on Supercomputing, SC '08, (Piscataway, NJ, USA), pp. 4:1--4:12, 2008."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 2000 ACM\/IEEE Conference on Supercomputing, SC '00","author":"Rivera G.","year":"2000","unstructured":"G. Rivera and C.-W. Tseng , \" Tiling optimizations for 3d scientific computations,\" in Proceedings of the 2000 ACM\/IEEE Conference on Supercomputing, SC '00 , ( Washington, DC, USA) , 2000 . G. Rivera and C.-W. Tseng, \"Tiling optimizations for 3d scientific computations,\" in Proceedings of the 2000 ACM\/IEEE Conference on Supercomputing, SC '00, (Washington, DC, USA), 2000."},{"issue":"1","key":"e_1_2_1_7_1","first-page":"115","volume":"39","author":"Jiayuan M.","year":"2010","unstructured":"M. Jiayuan and S. Kevin , \"A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations,\" vol. 39 , no. 1 , pp. 115 -- 142 , 2010 . M. Jiayuan and S. Kevin, \"A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations,\" vol. 39, no. 1, pp. 115--142, 2010.","journal-title":"\"A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations,\""},{"key":"e_1_2_1_8_1","first-page":"623","volume-title":"Numerical solution of steady-state electromagnetic scattering problems using the time-dependent maxwell's equations,\" Microwave Theory and Techniques","author":"Taflove A.","year":"1975","unstructured":"A. Taflove and M. Brodwin , \" Numerical solution of steady-state electromagnetic scattering problems using the time-dependent maxwell's equations,\" Microwave Theory and Techniques , IEEE Transactions on, vol. 23 , pp. 623 -- 630 , Aug 1975 . A. Taflove and M. Brodwin, \"Numerical solution of steady-state electromagnetic scattering problems using the time-dependent maxwell's equations,\" Microwave Theory and Techniques, IEEE Transactions on, vol. 23, pp. 623--630, Aug 1975."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAP.1966.1138693"}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2693714.2693727","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2693714.2693727","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:31Z","timestamp":1750227211000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2693714.2693727"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12,3]]},"references-count":9,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,12,3]]}},"alternative-id":["10.1145\/2693714.2693727"],"URL":"https:\/\/doi.org\/10.1145\/2693714.2693727","relation":{},"ISSN":["0163-5964"],"issn-type":[{"type":"print","value":"0163-5964"}],"subject":[],"published":{"date-parts":[[2014,12,3]]},"assertion":[{"value":"2014-12-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}