{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:07:36Z","timestamp":1750306056905,"version":"3.41.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2017,5,27]],"date-time":"2017-05-27T00:00:00Z","timestamp":1495843200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000038","name":"NSERC","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"crossref"}]},{"name":"University of Toronto ECE Department"},{"name":"Queen Elizabeth II World Telecommunication Congress Graduate Scholarship in Science and Technology"},{"name":"Walter C. Sumner Foundation"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2017,9,30]]},"abstract":"<jats:p>Field-Programmable Gate Arrays (FPGAs) can yield higher performance and lower power than software solutions on CPUs or GPUs. However, designing with FPGAs requires specialized hardware design skills and hours-long CAD processing times. To reduce and accelerate the design effort, we can implement an overlay architecture on the FPGA, on which we then more easily construct the desired system but at a large cost in performance and area relative to a direct FPGA implementation. In this work, we compare the micro-architecture, performance, and area of two soft-processor overlays: the Octavo multi-threaded soft-processor and the MXP soft vector processor. To measure the area and performance penalties of these overlays relative to the underlying FPGA hardware, we compare direct FPGA implementations of the micro-benchmarks written in C synthesized with the LegUp HLS tool and also written in the Verilog HDL. Overall, Octavo\u2019s higher operating frequency and MXP\u2019s more efficient code execution results in similar performance from both, within an order of magnitude of direct FPGA implementations, but with a penalty of an order of magnitude greater area.<\/jats:p>","DOI":"10.1145\/3053679","type":"journal-article","created":{"date-parts":[[2017,5,31]],"date-time":"2017-05-31T19:32:40Z","timestamp":1496259160000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Microarchitectural Comparison of the MXP and Octavo Soft-Processor FPGA Overlays"],"prefix":"10.1145","volume":"10","author":[{"given":"Charles Eric","family":"Laforest","sequence":"first","affiliation":[{"name":"University of Toronto, ON, Canada"}]},{"given":"Jason H.","family":"Anderson","sequence":"additional","affiliation":[{"name":"University of Toronto, ON, Canada"}]}],"member":"320","published-online":{"date-parts":[[2017,5,27]]},"reference":[{"volume-title":"Nios II Performance Benchmarks. Retrieved","year":"2014","key":"e_1_2_1_1_1","unstructured":"Altera. 2014. Nios II Performance Benchmarks. Retrieved August 2014 from http:\/\/www.altera.com\/literature\/ds\/ds_nios2_perf.pdf. Altera. 2014. Nios II Performance Benchmarks. Retrieved August 2014 from http:\/\/www.altera.com\/literature\/ds\/ds_nios2_perf.pdf."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2012.25"},{"volume-title":"Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL\u201914)","author":"Canis A.","key":"e_1_2_1_3_1","unstructured":"A. Canis , S. Brown , and J. H. Anderson . 2014. Modulo SDC scheduling with recurrence minimization in high-Level synthesis . In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL\u201914) . 1--8. A. Canis, S. Brown, and J. H. Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-Level synthesis. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL\u201914). 1--8."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2514740"},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201913)","author":"Capalija D.","key":"e_1_2_1_5_1","unstructured":"D. Capalija and T. S. Abdelrahman . 2013. A high-performance overlay architecture for pipelined execution of data flow graphs . In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201913) . 1--8. D. Capalija and T. S. Abdelrahman. 2013. A high-performance overlay architecture for pipelined execution of data flow graphs. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201913). 1--8."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2629443"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2010.17"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2005.1515690"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/EUC.2014.26"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2006.10"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2016.12"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1815968"},{"key":"e_1_2_1_13_1","unstructured":"ITRS. 2011. International Roadmap For Semiconductors: Design. Retrieved from http:\/\/www.itrs.net\/Links\/2011itrs\/2011Chapters\/2011Design.pdf.  ITRS. 2011. International Roadmap For Semiconductors: Design. Retrieved from http:\/\/www.itrs.net\/Links\/2011itrs\/2011Chapters\/2011Design.pdf."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046207"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2007.45"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2007.4380625"},{"volume-title":"Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL\u201907)","author":"Labrecque M.","key":"e_1_2_1_17_1","unstructured":"M. Labrecque and J. G. Steffan . 2007. Improving pipelined soft processors with multithreading . In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL\u201907) . 210--215. M. Labrecque and J. G. Steffan. 2007. Improving pipelined soft processors with multithreading. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL\u201907). 210--215."},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201909)","author":"Labrecque Martin","key":"e_1_2_1_18_1","unstructured":"Martin Labrecque and J. Gregory Steffan . 2009. Fast critical sections via thread scheduling for FPGA-based multithreaded processors . In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201909) . 18--25. Martin Labrecque and J. Gregory Steffan. 2009. Fast critical sections via thread scheduling for FPGA-based multithreaded processors. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201909). 18--25."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2008.8"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2014.7082760"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145694.2145731"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2013.6718360"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2015.7393130"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1970353.1970355"},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201913)","author":"Murray K. E.","key":"e_1_2_1_25_1","unstructured":"K. E. Murray , S. Whitty , S. Liu , J. Luu , and V. Betz . 2013. Titan: Enabling large and complex benchmarks in academic CAD . In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201913) . 1--8. K. E. Murray, S. Whitty, S. Liu, J. Luu, and V. Betz. 2013. Titan: Enabling large and complex benchmarks in academic CAD. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201913). 1--8."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/RECONF.2006.307780"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2554688.2554774"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the IEEE International Conference on Field Programmable Technology (FPT\u201912)","author":"Severance Aaron","year":"2012","unstructured":"Aaron Severance and Guy Lemieux . 2012 . VENICE: A compact vector processor for FPGA applications . In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT\u201912) . 261--268. Aaron Severance and Guy Lemieux. 2012. VENICE: A compact vector processor for FPGA applications. In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT\u201912). 261--268."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/2555692.2555698"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2013.6645537"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1723112.1723134"},{"key":"e_1_2_1_32_1","unstructured":"United States Bureau of Labor Statistics. 2012. Occupational Outlook Handbook.  United States Bureau of Labor Statistics. 2012. Occupational Outlook Handbook."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1950413.1950419"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2013.2284281"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145694.2145713"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3053679","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3053679","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:03:36Z","timestamp":1750215816000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3053679"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,5,27]]},"references-count":35,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,9,30]]}},"alternative-id":["10.1145\/3053679"],"URL":"https:\/\/doi.org\/10.1145\/3053679","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2017,5,27]]},"assertion":[{"value":"2016-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-05-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}