{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:08:05Z","timestamp":1750306085487,"version":"3.41.0"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2018,1,9]],"date-time":"2018-01-09T00:00:00Z","timestamp":1515456000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2018,3,31]]},"abstract":"<jats:p>\n            Soft processors have a role to play in simplifying field-programmable gate array (FPGA) application design as they can be deployed only when needed, and it is easier to write and debug single-threaded software code than create hardware. The breadth of this second role increases when the performance of the soft processor increases, yet the sophisticated out-of-order superscalar approaches that arrived in the mid-1990s are not employed, despite their area cost now being easily tolerable. In this article, we take an important step toward out-of-order execution in soft processors by exploring instruction scheduling in an FPGA substrate. This differs from the hard-processor design problem because the logic substrate is restricted to LUTs, whereas hard processor scheduling circuits employ CAM and wired-OR structures to great benefit. We discuss both circuit and microarchitectural trade-offs and compare three circuit structures for the scheduler, including a new structure called a\n            <jats:italic>fused-logic matrix scheduler<\/jats:italic>\n            . Using our optimized circuits, we show that four-issue distributed schedulers with up to 54 entries can be built with the same cycle time as the commercial Nios II\/f soft processor (240MHz). This careful design has the potential to significantly increase both the IPC and raw compute performance of a soft processor, compared to current commercial soft processors.\n          <\/jats:p>","DOI":"10.1145\/3093741","type":"journal-article","created":{"date-parts":[[2018,1,9]],"date-time":"2018-01-09T13:26:11Z","timestamp":1515504371000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["High-Performance Instruction Scheduling Circuits for Superscalar Out-of-Order Soft Processors"],"prefix":"10.1145","volume":"11","author":[{"given":"Henry","family":"Wong","sequence":"first","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]},{"given":"Vaughn","family":"Betz","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]},{"given":"Jonathan","family":"Rose","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Canada"}]}],"member":"320","published-online":{"date-parts":[[2018,1,9]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_2_1_1_1","DOI":"10.1109\/FPT.2010.5681442"},{"volume-title":"Stratix IV Device Handbook","key":"e_1_2_1_2_1"},{"unstructured":"Altera. 2015. Nios II Performance Benchmarks DS-N28162004.  Altera. 2015. Nios II Performance Benchmarks DS-N28162004.","key":"e_1_2_1_3_1"},{"volume-title":"Proceedings of the Conference on Microarchitecture (MICRO\u201901)","author":"Brown Mary D.","key":"e_1_2_1_4_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_5_1","DOI":"10.1145\/335231.335263"},{"doi-asserted-by":"publisher","key":"e_1_2_1_6_1","DOI":"10.1109\/TC.2007.70743"},{"doi-asserted-by":"publisher","key":"e_1_2_1_7_1","DOI":"10.1109\/ISCA.2002.1003560"},{"doi-asserted-by":"publisher","key":"e_1_2_1_8_1","DOI":"10.1109\/4.668985"},{"volume-title":"Proceedings of the International Solid-State Circuits Conference (ISSCC\u201911)","author":"Golden M.","key":"e_1_2_1_9_1"},{"volume-title":"Proceedings of the Conference on Microarchitecture (MICRO\u201901)","year":"2001","author":"Goshima Masahiro","key":"e_1_2_1_10_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_11_1","DOI":"10.1109\/WWC.2001.990739"},{"doi-asserted-by":"publisher","key":"e_1_2_1_13_1","DOI":"10.1109\/ACSSC.2003.1292373"},{"volume-title":"Implementation of Instruction Scheduler on FPGA. Master\u2019s thesis","author":"Johri Abhishek","key":"e_1_2_1_14_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_15_1","DOI":"10.1145\/859618.859623"},{"volume-title":"Technology Insight: Intel Silvermont Microarchitecture. IDF 2013","year":"2013","author":"Kuttanna Belli","key":"e_1_2_1_16_1"},{"volume-title":"Bochs: A portable PC emulator for unix\/X. Linux J.","year":"1996","author":"Lawton Kevin P.","key":"e_1_2_1_17_1"},{"volume-title":"Proceedings of the International Symposium on Computer Architecture (ISCA\u201998)","author":"Lynch W. L.","key":"e_1_2_1_18_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_19_1","DOI":"10.1145\/1152154.1152193"},{"doi-asserted-by":"publisher","key":"e_1_2_1_20_1","DOI":"10.1109\/HPCA.2001.903249"},{"doi-asserted-by":"publisher","key":"e_1_2_1_21_1","DOI":"10.1145\/264107.264201"},{"volume-title":"Proceedings of the Design, Automation & Test in Europe Conference (DATE\u201912)","author":"Rosi\u00e8re M.","key":"e_1_2_1_22_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_23_1","DOI":"10.1145\/1250662.1250704"},{"doi-asserted-by":"publisher","key":"e_1_2_1_24_1","DOI":"10.1145\/1723112.1723116"},{"unstructured":"SPEC. 2000. SPEC CPU95 Results. Retrieved from https:\/\/www.spec.org\/cpu95\/results\/.  SPEC. 2000. SPEC CPU95 Results. Retrieved from https:\/\/www.spec.org\/cpu95\/results\/.","key":"e_1_2_1_25_1"},{"volume-title":"Proceedings of the Conference on Microarchitecture (MICRO\u201900)","author":"Stark Jared","key":"e_1_2_1_26_1"},{"volume-title":"Proceedings of the Ontology for the Router Configuration Conference (ORConf\u201917)","year":"2017","author":"Terpstra Wesley","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","first-page":"11","article-title":"5-GHz 32-bit integer execution core in 130-nm dual-VT CMOS","volume":"37","author":"Vangal S.","year":"2002","journal-title":"IEEE JSSC"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.1109\/FCCM.2016.11"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3093741","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3093741","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:30:15Z","timestamp":1750217415000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3093741"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,1,9]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,3,31]]}},"alternative-id":["10.1145\/3093741"],"URL":"https:\/\/doi.org\/10.1145\/3093741","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2018,1,9]]},"assertion":[{"value":"2016-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-01-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}