{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T10:57:43Z","timestamp":1777546663272,"version":"3.51.4"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2015,3,6]],"date-time":"2015-03-06T00:00:00Z","timestamp":1425600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["CNS-0914474 and CNS-1149285"],"award-info":[{"award-number":["CNS-0914474 and CNS-1149285"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2015,3,6]]},"abstract":"<jats:p>\n            The increasing usage of hardware accelerators such as Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) has significantly increased application design complexity. Such complexity results from a larger design space created by numerous combinations of accelerators, algorithms, and hw\/sw partitions. Exploration of this increased design space is critical due to widely varying performance and energy consumption for each accelerator when used for different application domains and different use cases. To address this problem, numerous studies have evaluated specific applications across different architectures. In this article, we analyze an important domain of applications, referred to as\n            <jats:italic>sliding-window applications<\/jats:italic>\n            , implemented on FPGAs, GPUs, and multicore CPUs. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that, for large input sizes, FPGAs can achieve speedups of up to 5.6\u00d7 and 58\u00d7 compared to GPUs and multicore CPUs, respectively, while also using up to an order of magnitude less energy. For small input sizes and applications with frequency-domain algorithms, GPUs generally provide the best performance and energy.\n          <\/jats:p>","DOI":"10.1145\/2659000","type":"journal-article","created":{"date-parts":[[2015,3,9]],"date-time":"2015-03-09T19:03:01Z","timestamp":1425927781000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":23,"title":["A Tradeoff Analysis of FPGAs, GPUs, and Multicores for Sliding-Window Applications"],"prefix":"10.1145","volume":"8","author":[{"given":"Patrick","family":"Cooke","sequence":"first","affiliation":[{"name":"University of Florida, Gainesville, USA"}]},{"given":"Jeremy","family":"Fowers","sequence":"additional","affiliation":[{"name":"University of Florida, Gainesville, USA"}]},{"given":"Greg","family":"Brown","sequence":"additional","affiliation":[{"name":"University of Florida, Gainesville, USA"}]},{"given":"Greg","family":"Stitt","sequence":"additional","affiliation":[{"name":"University of Florida, Gainesville, USA"}]}],"member":"320","published-online":{"date-parts":[[2015,3,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Altera. 2013. Altera\u2019s User-Customizable ARM-Based SoC. (2013). Retrieved from http:\/\/www.altera.com\/literature\/br\/br-soc-fpga.pdf.  Altera. 2013. Altera\u2019s User-Customizable ARM-Based SoC. (2013). Retrieved from http:\/\/www.altera.com\/literature\/br\/br-soc-fpga.pdf."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2009.5272532"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2007.43"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2012.2"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2008.24"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/SASP.2008.4570793"},{"key":"#cr-split#-e_1_2_1_7_1.1","unstructured":"B. Cope P. Y. K. Cheung W. Luk and S. Witt. 2005. Have GPUs made FPGAs redundant in the field of video processing&quest"},{"key":"#cr-split#-e_1_2_1_7_1.2","doi-asserted-by":"crossref","unstructured":"In Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology. 111--118. DOI: http:\/\/dx.doi.org\/10.1109\/FPT.2005.1568533 10.1109\/FPT.2005.1568533","DOI":"10.1109\/FPT.2005.1568533"},{"key":"#cr-split#-e_1_2_1_7_1.3","unstructured":"B. Cope P. Y. K. Cheung W. Luk and S. Witt. 2005. Have GPUs made FPGAs redundant in the field of video processing&quest"},{"key":"#cr-split#-e_1_2_1_7_1.4","doi-asserted-by":"crossref","unstructured":"In Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology. 111--118. DOI: http:\/\/dx.doi.org\/10.1109\/FPT.2005.1568533 10.1109\/FPT.2005.1568533","DOI":"10.1109\/FPT.2005.1568533"},{"key":"#cr-split#-e_1_2_1_7_1.5","unstructured":"B. Cope P. Y. K. Cheung W. Luk and S. Witt. 2005. Have GPUs made FPGAs redundant in the field of video processing&quest"},{"key":"#cr-split#-e_1_2_1_7_1.6","doi-asserted-by":"crossref","unstructured":"In Proceedings of the 2005 IEEE International Conference on Field-Programmable Technology. 111--118. DOI: http:\/\/dx.doi.org\/10.1109\/FPT.2005.1568533","DOI":"10.1109\/FPT.2005.1568533"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/1764631.1764645"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145694.2145704"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ULTSYM.1995.495835"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840301"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/997163.997199"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/968280.968304"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the Florida Conference on Recent Advances in Robotics.","author":"Hunt L.","year":"2009","unstructured":"L. Hunt . 2009 . Fault-aware machine vision in small unmanned systems . In Proceedings of the Florida Conference on Recent Advances in Robotics. L. Hunt. 2009. Fault-aware machine vision in small unmanned systems. In Proceedings of the Florida Conference on Recent Advances in Robotics."},{"key":"e_1_2_1_15_1","unstructured":"Intel Corporation. 2013. Intel SDK for OpenCL Applications 2013 Optimization Guide. Retrieved from http:\/\/software.intel.com\/sites\/products\/documentation\/ioclsdk\/2013\/Intel_SDK_for_OpenCL_Applications_2013_Optimization_Guide.pdf.  Intel Corporation. 2013. Intel SDK for OpenCL Applications 2013 Optimization Guide. Retrieved from http:\/\/software.intel.com\/sites\/products\/documentation\/ioclsdk\/2013\/Intel_SDK_for_OpenCL_Applications_2013_Optimization_Guide.pdf."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2010.84"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2007.896065"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the HiPC Conference.","author":"Mehta Sanyam","year":"2010","unstructured":"Sanyam Mehta , Arindam Misra , Ayush Singhal , Praveen Kumar , and Ankush Mittal . 2010 . A high-performance parallel implementation of sum of absolute differences algorithm for motion estimation using CUDA . In Proceedings of the HiPC Conference. Sanyam Mehta, Arindam Misra, Ayush Singhal, Praveen Kumar, and Ankush Mittal. 2010. A high-performance parallel implementation of sum of absolute differences algorithm for motion estimation using CUDA. In Proceedings of the HiPC Conference."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365490.1365500"},{"key":"e_1_2_1_20_1","unstructured":"NVIDIA. 2013. Tegra 4 Processors Smartphones Tablets. Retrieved from http:\/\/www.nvidia.com\/object\/tegra.html.  NVIDIA. 2013. Tegra 4 Processors Smartphones Tablets. Retrieved from http:\/\/www.nvidia.com\/object\/tegra.html."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2008.917757"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2011.120"},{"key":"e_1_2_1_23_1","unstructured":"Victor Podlozhnyuk. 2007. FFT-based 2D Convolution. Retrieved from http:\/\/developer.download.nvidia.com\/compute\/cuda\/2_2\/sdk\/website\/projects\/convolutionFFT2D\/doc\/convolutionFFT2D.pdf.  Victor Podlozhnyuk. 2007. FFT-based 2D Convolution. Retrieved from http:\/\/developer.download.nvidia.com\/compute\/cuda\/2_2\/sdk\/website\/projects\/convolutionFFT2D\/doc\/convolutionFFT2D.pdf."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the IEEE Region 10 Annual Conference on Speech and Image Technologies for Computing and Telecommunications (TENCON\u201997)","volume":"2","author":"Porter R. B.","year":"1997","unstructured":"R. B. Porter and N. W. Bergmann . 1997. A generic implementation framework for FPGA based stereo matching . In Proceedings of the IEEE Region 10 Annual Conference on Speech and Image Technologies for Computing and Telecommunications (TENCON\u201997) , Vol. 2 . 461--464. DOI: http:\/\/dx.doi.org\/10.1109\/TENCON. 1997 .648244 10.1109\/TENCON.1997.648244 R. B. Porter and N. W. Bergmann. 1997. A generic implementation framework for FPGA based stereo matching. In Proceedings of the IEEE Region 10 Annual Conference on Speech and Image Technologies for Computing and Telecommunications (TENCON\u201997), Vol. 2. 461--464. DOI: http:\/\/dx.doi.org\/10.1109\/TENCON.1997.648244"},{"key":"e_1_2_1_25_1","first-page":"265","article-title":"Information theoretic learning","volume":"1","author":"Principe Jose C.","year":"2000","unstructured":"Jose C. Principe , Dongxin Xu , and John Fisher . 2000 . Information theoretic learning . Unsupervised Adaptive Filtering 1 (2000), 265 -- 319 . Jose C. Principe, Dongxin Xu, and John Fisher. 2000. Information theoretic learning. Unsupervised Adaptive Filtering 1 (2000), 265--319.","journal-title":"Unsupervised Adaptive Filtering"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/2150916.2150918"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2010.69"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/1025123.1025831"},{"key":"e_1_2_1_29_1","unstructured":"Xilinx. 2013. All Programable SoC. Retrieved from http:\/\/www.xilinx.com\/products\/silicon-devices\/soc\/index.htm.  Xilinx. 2013. All Programable SoC. Retrieved from http:\/\/www.xilinx.com\/products\/silicon-devices\/soc\/index.htm."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2006.29"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2003.1206117"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2659000","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2659000","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:25Z","timestamp":1750227205000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2659000"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,3,6]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,3,6]]}},"alternative-id":["10.1145\/2659000"],"URL":"https:\/\/doi.org\/10.1145\/2659000","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,3,6]]},"assertion":[{"value":"2013-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-03-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}