{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:08:55Z","timestamp":1750306135049,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2017,6,27]],"date-time":"2017-06-27T00:00:00Z","timestamp":1498521600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2017,9,30]]},"abstract":"<jats:p>By using resource sharing field-programmable gate array (FPGA) compute engines, we can reduce the performance gap between soft scalar CPUs and resource-intensive custom datapath designs. This article demonstrates that Thread- and Instruction-Level parallel Template architecture (TILT), a programmable FPGA-based horizontally microcoded compute engine designed to highly utilize floating point (FP) functional units (FUs), can improve significantly the average throughput of eight FP-intensive applications compared to a soft scalar CPU (similar to a FP-extended Nios). For eight benchmark applications, we show that: (i) a base TILT configuration having a single instance for each FU type can improve the performance over a soft scalar CPU by 15.8 \u00d7 , while requiring on average 26% of the custom datapaths\u2019 area; (ii) selectively increasing the number of FUs can more than double TILT\u2019s average throughput, reducing the custom-datapath-throughput-gap from 576 \u00d7 to 14 \u00d7 ; and (iii) replicated instances of the most computationally dense TILT configuration that fit within the area of each custom datapath design can reduce the gap to 8.27 \u00d7 , while replicated instances of application-tuned configurations of TILT can reduce the custom-datapath-throughput-gap to an average of 5.22 \u00d7 , and up to 3.41 \u00d7 for the Matrix Multiply benchmark. Last, we present methods for design space reduction, and we correctly predict the computationally densest design for seven out of eight benchmarks.<\/jats:p>","DOI":"10.1145\/3079757","type":"journal-article","created":{"date-parts":[[2017,6,28]],"date-time":"2017-06-28T12:27:07Z","timestamp":1498652827000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Reducing the Performance Gap between Soft Scalar CPUs and Custom Hardware with TILT"],"prefix":"10.1145","volume":"10","author":[{"given":"Ilian","family":"Tili","sequence":"first","affiliation":[{"name":"University of Toronto, Ontario, Canada"}]},{"given":"Kalin","family":"Ovtcharov","sequence":"additional","affiliation":[{"name":"University of Toronto, Ontario, Canada"}]},{"given":"J. Gregory","family":"Steffan","sequence":"additional","affiliation":[{"name":"University of Toronto, Ontario, Canada"}]}],"member":"320","published-online":{"date-parts":[[2017,6,27]]},"reference":[{"volume-title":"Proceedings of the International Conference on Field Programmable Technology (FPT\u201910)","author":"Anjam F.","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1964.tb04103.x"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1086\/260062"},{"volume-title":"Proceedings of the Effective control for pipelined computers (COMPCON\u201990)","author":"Davidson E. S.","key":"e_1_2_1_4_1"},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201905)","author":"Dimond R.","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"J. A. Fisher. 1979. The Optimization of Horizontal Microcode Within and Beyond Basic Blocks: An Application of Processor Scheduling with Resources. Ph.D. Dissertation. New York University.   J. A. Fisher. 1979. The Optimization of Horizontal Microcode Within and Beyond Basic Blocks: An Application of Processor Scheduling with Resources. Ph.D. Dissertation. New York University.","DOI":"10.2172\/5752434"},{"volume-title":"Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools","year":"2005","author":"Fisher J. A.","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2006.10"},{"key":"e_1_2_1_9_1","unstructured":"E. G. Haug. 2013. Black Scholes Code. Retrieved from http:\/\/www.espenhaug.com\/black_scholes.html. (2013).  E. G. Haug. 2013. Black Scholes Code. Retrieved from http:\/\/www.espenhaug.com\/black_scholes.html. (2013)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1113\/jphysiol.1952.sp004764"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.9.6.841"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046207"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2009.14"},{"volume-title":"Proceedings of the International Conference on Field Programmable Technology (FPT\u201911)","author":"Kapre N.","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2011.2173199"},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201907)","author":"Labrecque M.","key":"e_1_2_1_16_1"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/53990.54022"},{"volume-title":"Proceedings of ACM High Performance Graphics","year":"2011","author":"Lee W. J.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2011.51"},{"key":"e_1_2_1_20_1","unstructured":"Compiler LLVM. 2012. The LLVM Compiler Infrastructure. Retrieved from http:\/\/llvm.org. Version 3.1.  Compiler LLVM. 2012. The LLVM Compiler Infrastructure. Retrieved from http:\/\/llvm.org. Version 3.1."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-6596\/256\/1\/012026"},{"volume-title":"Proceedings of the 1995 Conference on Imaging Science and Technology (IST\u201995)","author":"Mann S.","key":"e_1_2_1_22_1"},{"volume-title":"Proceedings of the 2002 IEEE International Conference on Field-Programmable Technology (FPT\u201902)","author":"Mei B.","key":"e_1_2_1_23_1"},{"volume-title":"Proceedings of the Field Programmable Logic and Application, 13th International Conference (FPL\u201903)","author":"Mei B.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","unstructured":"MESA. 2013a. Matrix Inverse Code. Retrieved from http:\/\/express.ece.ucsb.edu\/benchmark\/mesa\/invert_matrix_general.html.  MESA. 2013a. Matrix Inverse Code. Retrieved from http:\/\/express.ece.ucsb.edu\/benchmark\/mesa\/invert_matrix_general.html."},{"key":"e_1_2_1_26_1","unstructured":"MESA. 2013b. Matrix Multiply Code. Retrieved from http:\/\/express.ece.ucsb.edu\/benchmark\/mesa\/matmul.html.  MESA. 2013b. Matrix Multiply Code. Retrieved from http:\/\/express.ece.ucsb.edu\/benchmark\/mesa\/matmul.html."},{"key":"e_1_2_1_27_1","unstructured":"G. D. Micheli. 1994. Synthesis and Optimization of Digital Circuits. McGraw-Hill.   G. D. Micheli. 1994. Synthesis and Optimization of Digital Circuits. McGraw-Hill."},{"volume-title":"Proceedings of the IEICE Transactions on Information and Systems E82-D. 389--397","author":"Miyamori T.","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2007.4380768"},{"key":"e_1_2_1_30_1","unstructured":"NVidia. 2013a. Gaussian Blur Benchmark Code. Retrieved from http:\/\/http.developer.nvidia.com\/GPUGems3\/gpugems3_ch40.html.  NVidia. 2013a. Gaussian Blur Benchmark Code. Retrieved from http:\/\/http.developer.nvidia.com\/GPUGems3\/gpugems3_ch40.html."},{"key":"e_1_2_1_31_1","unstructured":"NVidia. 2013b. N Body Benchmark Code. Retrieved from http:\/\/http.developer.nvidia.com\/GPUGems3\/gpugems3_ch31.html.  NVidia. 2013b. N Body Benchmark Code. Retrieved from http:\/\/http.developer.nvidia.com\/GPUGems3\/gpugems3_ch31.html."},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201913)","author":"Ovtcharov Kalin","key":"e_1_2_1_32_1"},{"volume-title":"Proceedings of the IEEE International Conference on Reconfigurable Computing and FPGA\u2019s (ReConFig\u201906)","author":"Saghir M. A. R.","key":"e_1_2_1_33_1"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.859540"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1950413.1950419"},{"volume-title":"Proceedings of the IEEE International Conference on Computer Science and Automation Engineering (CSAE\u201911)","author":"Xu F.","key":"e_1_2_1_36_1"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2002.804276"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3079757","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3079757","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:37:14Z","timestamp":1750217834000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3079757"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,6,27]]},"references-count":37,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,9,30]]}},"alternative-id":["10.1145\/3079757"],"URL":"https:\/\/doi.org\/10.1145\/3079757","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2017,6,27]]},"assertion":[{"value":"2016-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-06-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}