{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T15:53:41Z","timestamp":1761580421858,"version":"3.41.0"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2004,6,7]],"date-time":"2004-06-07T00:00:00Z","timestamp":1086566400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2006,7]]},"abstract":"<jats:p>We describe a new processing architecture, known as a warp processor, that utilizes a field-programmable gate array (FPGA) to improve the speed and energy consumption of a software binary executing on a microprocessor. Unlike previous approaches that also improve software using an FPGA but do so using a special compiler, a warp processor achieves these improvements completely transparently and operates from a standard binary. A warp processor dynamically detects the binary's critical regions, reimplements those regions as a custom hardware circuit in the FPGA, and replaces the software region by a call to the new hardware implementation of that region. While not all benchmarks can be improved using warp processing, many can, and the improvements are dramatically better than those achievable by more traditional architecture improvements. The hardest part of warp processing is that of dynamically reimplementing code regions on an FPGA, requiring partitioning, decompilation, synthesis, placement, and routing tools, all having to execute with minimal computation time and data memory so as to coexist on chip with the main processor. We describe the results of developing our warp processor. We developed a custom FPGA fabric specifically designed to enable lean place and route tools, and we developed extremely fast and efficient versions of partitioning, decompilation, synthesis, technology mapping, placement, and routing. Warp processors achieve overall application speedups of 6.3X with energy savings of 66% across a set of embedded benchmark applications. We further show that our tools utilize acceptably small amounts of computation and memory which are far less than traditional tools. Our work illustrates the feasibility and potential of warp processing, and we can foresee the possibility of warp processing becoming a feature in a variety of computing domains, including desktop, server, and embedded applications.<\/jats:p>","DOI":"10.1145\/1142980.1142986","type":"journal-article","created":{"date-parts":[[2006,7,25]],"date-time":"2006-07-25T14:14:26Z","timestamp":1153836866000},"page":"659-681","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":101,"title":["Warp Processors"],"prefix":"10.1145","volume":"11","author":[{"given":"Roman","family":"Lysecky","sequence":"first","affiliation":[{"name":"University of Arizona, Tucson, AZ"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Greg","family":"Stitt","sequence":"additional","affiliation":[{"name":"University of California, Riverside, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Frank","family":"Vahid","sequence":"additional","affiliation":[{"name":"University of California, Riverside, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2004,6,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Altera Corp. 2006. Customer showcase. http:\/\/www.altera.com\/corporate\/cust_successes\/ customer_showcase\/view_product\/csh-vproduct-nios.jsp.  Altera Corp. 2006. Customer showcase. http:\/\/www.altera.com\/corporate\/cust_successes\/ customer_showcase\/view_product\/csh-vproduct-nios.jsp."},{"key":"e_1_2_1_2_1","unstructured":"Altera Corp. 2005. Excalibur embedded processor solutions. http:\/\/www.altera.com\/products\/ devices\/excalibur\/exc-index.html.  Altera Corp. 2005. Excalibur embedded processor solutions. http:\/\/www.altera.com\/products\/ devices\/excalibur\/exc-index.html."},{"key":"e_1_2_1_3_1","unstructured":"Atmel Corp. 2005. FPSLIC (AVR with FPGA) http:\/\/www.atmel.com\/products\/FPSLIC\/.  Atmel Corp. 2005. FPSLIC (AVR with FPGA) http:\/\/www.atmel.com\/products\/FPSLIC\/."},{"volume-title":"Proceedings of the International Workshop on Hardware\/Software Codesign (CODES), 62--69","author":"Balboni A.","key":"e_1_2_1_4_1"},{"volume-title":"Proceedings of the Embedded Signal Processing Conference (GSPx).","author":"Banerjee P.","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","unstructured":"Berkeley Design Technology Inc. 2004. http:\/\/www.bdti.com\/articles\/info_eet0207fpga.htm&num; DSPEnhanced&percent;20FPGAs.  Berkeley Design Technology Inc. 2004. http:\/\/www.bdti.com\/articles\/info_eet0207fpga.htm&num; DSPEnhanced&percent;20FPGAs."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Betz V. Rose J. and Marquardt A. 1999. Architecture and CAD for Deep-Submicron FPGAs. Kluwer Academic Hingham Mass.   Betz V. Rose J. and Marquardt A. 1999. Architecture and CAD for Deep-Submicron FPGAs. Kluwer Academic Hingham Mass.","DOI":"10.1007\/978-1-4615-5145-4"},{"volume-title":"Proceedings of the International Workshop on Field Programmable Logic and Applications (FPLA), 213--222","author":"Betz V.","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013623303037"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/359094.359101"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/268806.268810"},{"volume-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 97--105","author":"Chen W.","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","unstructured":"Christensen F. 2004. A scalable software-defined radio development system. Xcell J. Winter.  Christensen F. 2004. A scalable software-defined radio development system. Xcell J. Winter."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/92.766746"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1007\/3-540-61053-7_55","article-title":"Structuring decompiled graphs. In Proceedings of the International Conference on Compiler Construction","volume":"1060","author":"Cifuentes C.","year":"1996","journal-title":"Lecture Notes in Computer Science"},{"volume-title":"Tech. Rep. 439.","year":"1998","author":"Cifuentes C.","key":"e_1_2_1_16_1"},{"volume-title":"Proceedings of the Workshop on Binary Translation, 12--22","author":"Cifuentes C.","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","unstructured":"Critical Blue. 2005. http:\/\/www.criticalblue.com.  Critical Blue. 2005. http:\/\/www.criticalblue.com."},{"volume-title":"Cray XD1 brings high-bandwidth supercomputing to the mid-market. White Paper prepared for Cray","year":"2004","author":"Brown Associates","key":"e_1_2_1_19_1"},{"key":"e_1_2_1_20_1","unstructured":"EEMBC. 2005. The Embedded Microprocessor Benchmark Consortium. http:\/\/www.eembc.org.  EEMBC. 2005. The Embedded Microprocessor Benchmark Consortium. http:\/\/www.eembc.org."},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1008857008151","article-title":"System level hardware\/software partitioning based on simulated annealing and Tabu search","volume":"2","author":"Eles P.","year":"1997","journal-title":"Kluwer's Design Automation for Embedded Systems"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/54.245964"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/92.661251"},{"volume-title":"Proceedings of the Symposium on FPGAs for Custom Computing Machines (FCCM), 126","author":"Gokhale M.","key":"e_1_2_1_24_1"},{"volume-title":"Proceedings of the Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), 117--124","author":"Gordon-Ross A.","key":"e_1_2_1_25_1"},{"volume-title":"Proceedings of the Design Automation and Test in Europe Conference (DATE), 112--117","year":"2005","author":"Guo Z.","key":"e_1_2_1_26_1"},{"volume-title":"Proceedings of the Symposium on FPGAs for Custom Computing Machines (FCCM), 12-- 21","author":"Hauser J.","key":"e_1_2_1_27_1"},{"volume-title":"Proceedings of the Design Automation Conference (DAC), 691--696","author":"Henkel J.","key":"e_1_2_1_28_1"},{"volume-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 233--241","author":"Keane J.","key":"e_1_2_1_29_1"},{"volume-title":"Proceedings of the International Symposium on Microarchitecture (MICRO), 330--335","author":"Lee C.","key":"e_1_2_1_30_1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2003.820522"},{"volume-title":"Proceedings of the Design Automation and Test in Europe Conference (DATE), 10480","author":"Lysecky R.","key":"e_1_2_1_32_1"},{"volume-title":"Proceedings of the Design Automation Conference (DAC), 334--337","author":"Lysecky R.","key":"e_1_2_1_33_1"},{"volume-title":"Proceedings of the Symposium on Field-Programmable Custom Computing Machines (FCCM), 57--62","year":"2005","author":"Lysecky R.","key":"e_1_2_1_34_1"},{"volume-title":"Proceedings of the Design Automation Conference (DAC), 954--959","author":"Lysecky R.","key":"e_1_2_1_35_1"},{"volume-title":"Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 241--243","author":"Malik A.","key":"e_1_2_1_36_1"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/92.820764"},{"key":"e_1_2_1_38_1","unstructured":"Matsumoto C. 2000. Triscend adds 32-bit configurable SoC line. EE Times http:\/\/www. eet.com\/story\/OEG20000828S0015.  Matsumoto C. 2000. Triscend adds 32-bit configurable SoC line. EE Times http:\/\/www. eet.com\/story\/OEG20000828S0015."},{"volume-title":"Proceedings of the International Conference on Computer-Aided Design (ICCAD), 39--42","author":"Memik G.","key":"e_1_2_1_39_1"},{"volume-title":"Proceedings of the Design Automation Conference (DAC), 389--394","author":"Mittal G.","key":"e_1_2_1_40_1"},{"key":"e_1_2_1_41_1","unstructured":"Morris K. 2005. Cray goes FPGA. FPGA and Programmable Logic J. April.  Morris K. 2005. Cray goes FPGA. FPGA and Programmable Logic J. April."},{"volume-title":"C: The Art of Scientific Computing","year":"1992","author":"Press W.","key":"e_1_2_1_42_1"},{"key":"e_1_2_1_43_1","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1109\/4.121549","article-title":"The effect of logic block architecture on FPGA performance","volume":"27","author":"Singh S.","year":"1992","journal-title":"IEEE J. Solid-State Circuits."},{"volume-title":"Proceedings of the Design Automation Conference (DAC), 250--255","author":"Stitt G.","key":"e_1_2_1_44_1"},{"volume-title":"Proceedings of the International Conference on Computer Aided Design (ICCAD).","author":"Stitt G.","key":"e_1_2_1_45_1"},{"volume-title":"Proceedings of the International Conference on Computer Aided Design (ICCAD), 164--170","author":"Stitt G.","key":"e_1_2_1_46_1"},{"volume-title":"Proceedings of the International Conference on Hardware\/Software Codesign and System Synthesis (CODES&plus;ISSS), 285--290","author":"Stitt G.","key":"e_1_2_1_47_1"},{"key":"e_1_2_1_48_1","unstructured":"Tensilica Inc. 2006. XPRES compiler automatically generate processors from standard C code. http:\/\/www.tensilica.com\/products\/xpres.htm.  Tensilica Inc. 2006. XPRES compiler automatically generate processors from standard C code. http:\/\/www.tensilica.com\/products\/xpres.htm."},{"key":"e_1_2_1_49_1","unstructured":"Triscend Corp. 2003. http:\/\/www.triscend.com.  Triscend Corp. 2003. http:\/\/www.triscend.com."},{"volume-title":"Proceedings of the Conference on Compiler, Architecture, and Synthesis for Embedded Systems (CASES), 116--125","author":"Venkataramani G.","key":"e_1_2_1_50_1"},{"volume-title":"Proceedings of the Conference on Compiler, Architecture and Synthesis for Embedded Systems (CASES). 10","year":"2004","author":"Vissers K.","key":"e_1_2_1_51_1"},{"key":"e_1_2_1_52_1","unstructured":"Xilinx Inc. 2006. http:\/\/www.xilinx.com.  Xilinx Inc. 2006. http:\/\/www.xilinx.com."},{"key":"e_1_2_1_53_1","unstructured":"Xilinx Inc. 2005a. Customer success stories http:\/\/www.xilinx.com\/company\/success\/csprod. htm&num;embedded.  Xilinx Inc. 2005a. Customer success stories http:\/\/www.xilinx.com\/company\/success\/csprod. htm&num;embedded."},{"key":"e_1_2_1_54_1","unstructured":"Xilinx Inc. 2005b. Virtex-4 FPGAs http:\/\/www.xilinx.com\/products\/silicon_solutions\/fpgas\/ virtex\/virtex4\/index.htm.  Xilinx Inc. 2005b. Virtex-4 FPGAs http:\/\/www.xilinx.com\/products\/silicon_solutions\/fpgas\/ virtex\/virtex4\/index.htm."},{"key":"e_1_2_1_55_1","unstructured":"Xilinx Inc. 2004a. Partnering for success Xilinx and photonic bridges. http:\/\/www.xilinx.com\/ ipcenter\/processor_central\/embedded\/success_PB.pdf.  Xilinx Inc. 2004a. Partnering for success Xilinx and photonic bridges. http:\/\/www.xilinx.com\/ ipcenter\/processor_central\/embedded\/success_PB.pdf."},{"key":"e_1_2_1_56_1","unstructured":"Xilinx Inc. 2004b. Virtex-II Pro\/ProX FPGAs http:\/\/www.xilinx.com\/products\/silicon_solutions\/ fpgas\/virtex\/virtex_ii_pro_fpgas\/.  Xilinx Inc. 2004b. Virtex-II Pro\/ProX FPGAs http:\/\/www.xilinx.com\/products\/silicon_solutions\/ fpgas\/virtex\/virtex_ii_pro_fpgas\/."},{"key":"e_1_2_1_57_1","unstructured":"Xilinx Inc. 2000a. Xilinx introduces high level language compiler for Virtex FPGAs. Xilinx Press Release. http:\/\/www.xilinx.com\/prs_rls\/00119_forge.htm.  Xilinx Inc. 2000a. Xilinx introduces high level language compiler for Virtex FPGAs. Xilinx Press Release. http:\/\/www.xilinx.com\/prs_rls\/00119_forge.htm."},{"key":"e_1_2_1_58_1","unstructured":"Xilinx Inc. 2000b. Xilinx Version 3.3i software doubles clock frequencies. Xilinx Press Release. http:\/\/www.xilinx.com\/prs_rls\/00118_3_3i.htm.  Xilinx Inc. 2000b. Xilinx Version 3.3i software doubles clock frequencies. Xilinx Press Release. http:\/\/www.xilinx.com\/prs_rls\/00118_3_3i.htm."},{"volume-title":"Proceeding of the Conference on Supercomputing, article no. 16","author":"Zagha M., B.","key":"e_1_2_1_59_1"},{"volume-title":"Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP), 15--26","author":"Zhang X.","key":"e_1_2_1_60_1"},{"volume-title":"Proceedings of the International Symposium on High-Performance Computer Architectures, 241","author":"Zilles C. B.","key":"e_1_2_1_61_1"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1142980.1142986","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1142980.1142986","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T15:06:16Z","timestamp":1750259176000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1142980.1142986"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,6,7]]},"references-count":61,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2006,7]]}},"alternative-id":["10.1145\/1142980.1142986"],"URL":"https:\/\/doi.org\/10.1145\/1142980.1142986","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2004,6,7]]},"assertion":[{"value":"2004-06-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}