{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T12:11:16Z","timestamp":1761394276306,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,1,1]],"date-time":"2013-01-01T00:00:00Z","timestamp":1356998400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000144","name":"Division of Computer and Network Systems","doi-asserted-by":"publisher","award":["CNS-0914474"],"award-info":[{"award-number":["CNS-0914474"]}],"id":[{"id":"10.13039\/100000144","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2013,1]]},"abstract":"<jats:p>Recent architectural trends have focused on increased parallelism via multicore processors and increased heterogeneity via accelerator devices (e.g., graphics-processing units, field-programmable gate arrays). Although these architectures have significant performance and energy potential, application designers face many device-specific challenges when choosing an appropriate accelerator or when customizing an algorithm for an accelerator. To help address this problem, in this article we thoroughly evaluate convolution, one of the most common operations in digital-signal processing, on multicores, graphics-processing units, and field-programmable gate arrays. Whereas many previous application studies evaluate a specific usage of an application, this article assists designers with design space exploration for numerous use cases by analyzing effects of different input sizes, different algorithms, and different devices, while also determining Pareto-optimal trade-offs between performance and energy.<\/jats:p>","DOI":"10.1145\/2400682.2400684","type":"journal-article","created":{"date-parts":[[2013,1,22]],"date-time":"2013-01-22T15:28:56Z","timestamp":1358868536000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["A performance and energy comparison of convolution on GPUs, FPGAs, and multicore processors"],"prefix":"10.1145","volume":"9","author":[{"given":"Jeremy","family":"Fowers","sequence":"first","affiliation":[{"name":"University of Florida"}]},{"given":"Greg","family":"Brown","sequence":"additional","affiliation":[{"name":"University of Florida"}]},{"given":"John","family":"Wernsing","sequence":"additional","affiliation":[{"name":"University of Florida"}]},{"given":"Greg","family":"Stitt","sequence":"additional","affiliation":[{"name":"University of Florida"}]}],"member":"320","published-online":{"date-parts":[[2013,1,20]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Altera Inc. 2011a. Buy FPGA and CPLD Devices Stratix IV EP4SE820. 2010. http:\/\/www.buyaltera.com\/scripts\/partsearch.dll\/multisearch&quest;site=ALTERA&lang=EN&keywords=EP4SE820.  Altera Inc. 2011a. Buy FPGA and CPLD Devices Stratix IV EP4SE820. 2010. http:\/\/www.buyaltera.com\/scripts\/partsearch.dll\/multisearch&quest;site=ALTERA&lang=EN&keywords=EP4SE820."},{"key":"e_1_2_1_2_1","unstructured":"Altera Inc. 2011. FFT MegaCore Function User Guide. http:\/\/www.altera.com\/literature\/ug\/ug_fft.pdf.  Altera Inc. 2011. FFT MegaCore Function User Guide. http:\/\/www.altera.com\/literature\/ug\/ug_fft.pdf."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2010.144"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2011.41"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2007.43"},{"key":"e_1_2_1_6_1","unstructured":"Brookwood N. March 2010. AMD fusion family of APUs: Enabling a superior immersive PC experience. http:\/\/sites.amd.com\/us\/Documents\/48423B_fusion_whitepaper_WEB.pdf  Brookwood N. March 2010. AMD fusion family of APUs: Enabling a superior immersive PC experience. http:\/\/sites.amd.com\/us\/Documents\/48423B_fusion_whitepaper_WEB.pdf"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/SASP.2008.4570793"},{"volume-title":"Proceedings of the International Symposium on Applied Reconfigurable Computing. 110--121","author":"Dong Y.","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Eles P. Peng Z. Kunchinsinski K. and Doboli A. 1997. System level hardware\/software partitioning based on simulated annealing and tabu search. In Design Automation for Embedded Systems. 5--32.  Eles P. Peng Z. Kunchinsinski K. and Doboli A. 1997. System level hardware\/software partitioning based on simulated annealing and tabu search. In Design Automation for Embedded Systems. 5--32.","DOI":"10.1023\/A:1008857008151"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145694.2145704"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840301"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1155\/2008\/930250"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2011.11"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2002.807764"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/968280.968304"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5160980"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2010.84"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000832.2000842"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2011.40"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2011.27"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1982.1163949"},{"volume-title":"Proceedings of the IEEE National Aerospace and Electronics Conference (NAECON'08)","author":"Merchant S.","key":"e_1_2_1_22_1"},{"first-page":"57","volume-title":"Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA'08)","author":"Nelson B. E.","key":"e_1_2_1_23_1"},{"key":"e_1_2_1_24_1","unstructured":"Nvidia Corp. 2011a. CUDA CUFFT library. http:\/\/developer.nvidia.com\/cuda-toolkit-40  Nvidia Corp. 2011a. CUDA CUFFT library. http:\/\/developer.nvidia.com\/cuda-toolkit-40"},{"key":"e_1_2_1_25_1","unstructured":"Nvidia Corp. 2011b. NVIDIA tegra 2. http:\/\/www.nvidia.com\/object\/tegra-2.html  Nvidia Corp. 2011b. NVIDIA tegra 2. http:\/\/www.nvidia.com\/object\/tegra-2.html"},{"key":"e_1_2_1_26_1","unstructured":"Nvidia Corp. 2011c. NVIDIA tesla workstations. http:\/\/www.nvidia.com\/object\/personal-supercomputing.html  Nvidia Corp. 2011c. NVIDIA tesla workstations. http:\/\/www.nvidia.com\/object\/personal-supercomputing.html"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2008.917757"},{"volume-title":"Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. 219--228","author":"Underwood K. D.","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1862648.1862649"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS.2009.110"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2400682.2400684","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2400682.2400684","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:35:01Z","timestamp":1750239301000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2400682.2400684"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,1]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,1]]}},"alternative-id":["10.1145\/2400682.2400684"],"URL":"https:\/\/doi.org\/10.1145\/2400682.2400684","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2013,1]]},"assertion":[{"value":"2011-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-01-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}