{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:13:59Z","timestamp":1750306439568,"version":"3.41.0"},"reference-count":17,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2016,5,20]],"date-time":"2016-05-20T00:00:00Z","timestamp":1463702400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["CNS-1149285"],"award-info":[{"award-number":["CNS-1149285"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2016,9,12]]},"abstract":"<jats:p>Sliding-window applications, an important class of the digital-signal processing domain, are highly amenable to pipeline parallelism on field-programmable gate arrays (FPGAs). Although memory bandwidth often restricts parallelism for many applications, sliding-window applications can leverage custom buffers, referred to as sliding-window generators, that provide massive input bandwidth that far exceeds the capabilities of external memory. Previous work has introduced a variety of sliding-window generators, but those approaches typically generate at most one window per cycle, which significantly restricts parallelism. In this article, we address this limitation with a parallel sliding-window generator that can generate a configurable number of windows every cycle. Although in practice the number of parallel windows is limited by memory bandwidth, we show that even with common bandwidth limitations, the presented generator enables near-linear speedups up to 16x faster than previous FPGA studies that generate a single window per cycle, which were already in some cases faster than graphics-processing units and microprocessors.<\/jats:p>","DOI":"10.1145\/2800789","type":"journal-article","created":{"date-parts":[[2016,5,22]],"date-time":"2016-05-22T01:23:59Z","timestamp":1463880239000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs"],"prefix":"10.1145","volume":"9","author":[{"given":"Greg","family":"Stitt","sequence":"first","affiliation":[{"name":"University of Florida, Gainesville, FL"}]},{"given":"Eric","family":"Schwartz","sequence":"additional","affiliation":[{"name":"University of Florida, Gainesville, FL"}]},{"given":"Patrick","family":"Cooke","sequence":"additional","affiliation":[{"name":"University of Florida, Gainesville, FL"}]}],"member":"320","published-online":{"date-parts":[[2016,5,20]]},"reference":[{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201909)","year":"2009","author":"Asano S.","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2007.43"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/212094.212141"},{"key":"e_1_2_1_4_1","unstructured":"C. S. S. Burrus and T. W. Parks. 1991. DFT\/FFT and Convolution Algorithms: Theory and Implementation. John Wiley & Sons New York NY.   C. S. S. Burrus and T. W. Parks. 1991. DFT\/FFT and Convolution Algorithms: Theory and Implementation. John Wiley & Sons New York NY."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2008.24"},{"key":"e_1_2_1_6_1","unstructured":"Shane Cook. 2013. CUDA Programming: A Developer\u2019s Guide to Parallel Computing with GPUs. Morgan Kaufmann San Francisco CA.   Shane Cook. 2013. CUDA Programming: A Developer\u2019s Guide to Parallel Computing with GPUs. Morgan Kaufmann San Francisco CA."},{"volume-title":"Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology. 111--118","year":"2005","author":"Cope B.","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1980.1163353"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1764631.1764645"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145694.2145704"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400684"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/997163.997199"},{"key":"e_1_2_1_13_1","unstructured":"Mark Harris. 2007. Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology.  Mark Harris. 2007. Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/SAAHPC.2011.11"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.120"},{"volume-title":"IEE Proceedings\u2014Computers and Digital Techniques 148","year":"2001","author":"Weinhaudt M.","key":"e_1_2_1_16_1"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2006.29"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2800789","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2800789","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T05:42:44Z","timestamp":1750225364000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2800789"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,5,20]]},"references-count":17,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2016,9,12]]}},"alternative-id":["10.1145\/2800789"],"URL":"https:\/\/doi.org\/10.1145\/2800789","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2016,5,20]]},"assertion":[{"value":"2014-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-05-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}