{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,24]],"date-time":"2025-10-24T16:44:04Z","timestamp":1761324244942,"version":"3.41.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2019,10,7]],"date-time":"2019-10-07T00:00:00Z","timestamp":1570406400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100002322","name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior\u2013Brasil","doi-asserted-by":"crossref","award":["001"],"award-info":[{"award-number":["001"]}],"id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Conselho Nacional de Desenvolvimento Cient\u00edfico e Tecnol\u00f3gico\u2013CNPq"},{"name":"Funda\u00e7\u00e3o de Amparo \u00e0 Pesquisa do Estado de Minas Gerais\u2013FAPEMIG"},{"name":"Paderborn Center for Parallel Computing\/Germany, Intel Academic Compute Environment"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2019,10,31]]},"abstract":"<jats:p>In this work, we propose a framework called REconfigurable Accelerator DeploY (READY), the first framework to support polynomial runtime mapping of dataflow applications in high-performance CPU-FPGA platforms. READY introduces an efficient mapping with fine-grained multithreading onto an overlay architecture that hides the latency of a global interconnection network. In addition to our overlay architecture, we show how this system helps solve some of the challenges for FPGA cloud computing adoption in high-performance computing. The framework encapsulates dataflow descriptions by using a target independent, high-level API, and a dataflow model that allows for explicit spatial and temporal parallelism. READY directly maps the dataflow kernels onto the accelerator. Our tool is flexible and extensible and provides the infrastructure to explore different accelerator designs. We validate READY on the Intel Harp platform, and our experimental results show an average 2x execution runtime improvement when compared to an 8-thread multi-core processor.<\/jats:p>","DOI":"10.1145\/3358187","type":"journal-article","created":{"date-parts":[[2019,10,10]],"date-time":"2019-10-10T13:13:05Z","timestamp":1570713185000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["READY"],"prefix":"10.1145","volume":"18","author":[{"given":"Lucas Bragan\u00e7a Da","family":"Silva","sequence":"first","affiliation":[{"name":"Universidade Federal de Vi\u00e7osa, Florestal, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ricardo","family":"Ferreira","sequence":"additional","affiliation":[{"name":"Universidade Federal de Vi\u00e7osa, Vi\u00e7osa, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Canesche","sequence":"additional","affiliation":[{"name":"Universidade Federal de Vi\u00e7osa, Vi\u00e7osa, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marcelo M.","family":"Menezes","sequence":"additional","affiliation":[{"name":"Universidade Federal de Vi\u00e7osa, Vi\u00e7osa, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maria D.","family":"Vieira","sequence":"additional","affiliation":[{"name":"Universidade Federal de Vi\u00e7osa, Florestal, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeronimo","family":"Penha","sequence":"additional","affiliation":[{"name":"Centro Federal de Educa\u00e7\u00e3o Tecnol\u00f3gica de Minas Gerais, Leopoldina, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Jamieson","sequence":"additional","affiliation":[{"name":"Miami University, Oxford, Ohio, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9 Augusto M.","family":"Nacif","sequence":"additional","affiliation":[{"name":"Universidade Federal de Vi\u00e7osa, Florestal, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,10,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Amazon. 2018. Elastic Compute Cloud - Amazon EC2 - AWS. http:\/\/aws.amazon.com\/ec2\/.  Amazon. 2018. Elastic Compute Cloud - Amazon EC2 - AWS. http:\/\/aws.amazon.com\/ec2\/."},{"volume-title":"Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 701--714","author":"Asghari Hodjat","key":"e_1_2_1_2_1","unstructured":"Hodjat Asghari Esfeden and et al. 2019. CORF: Coalescing operand register file for GPUs . In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 701--714 . Hodjat Asghari Esfeden and et al. 2019. CORF: Coalescing operand register file for GPUs. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 701--714."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2857278"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2013.6645515"},{"key":"e_1_2_1_5_1","unstructured":"FPGA Cross-Platform and Application Developers. [n.d.]. Simplify software integration for FPGA accelerators with OPAE.  FPGA Cross-Platform and Application Developers. [n.d.]. Simplify software integration for FPGA accelerators with OPAE."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/800139.804563"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211332.3211342"},{"key":"e_1_2_1_8_1","unstructured":"Renesas Eletronics. 2019. STP Engine (IP Core). www.renesas.com\/br\/en\/products\/power-management\/pmic\/stp-engine.html.  Renesas Eletronics. 2019. STP Engine (IP Core). www.renesas.com\/br\/en\/products\/power-management\/pmic\/stp-engine.html."},{"volume-title":"IEEE ICCD - International Conference on Computer Design.","author":"Di L.","key":"e_1_2_1_9_1","unstructured":"L. Di Tucci et al. 2017. The role of CAD frameworks in heterogeneous FPGA-based cloud systems . In IEEE ICCD - International Conference on Computer Design. L. Di Tucci et al. 2017. The role of CAD frameworks in heterogeneous FPGA-based cloud systems. In IEEE ICCD - International Conference on Computer Design."},{"volume-title":"Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, 195--204","author":"Ricardo","key":"e_1_2_1_10_1","unstructured":"Ricardo Ferreira and et al. 2011. An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture . In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, 195--204 . Ricardo Ferreira and et al. 2011. An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, 195--204."},{"volume-title":"2011 9th IEEE International Conference on Industrial Informatics. 810--815","author":"Ferreira R.","key":"e_1_2_1_11_1","unstructured":"R. Ferreira , J. Vendramini , and M. Nacif . 2011. Dynamic reconfigurable multicast interconnections by using radix-4 multistage networks in FPGA . In 2011 9th IEEE International Conference on Industrial Informatics. 810--815 . R. Ferreira, J. Vendramini, and M. Nacif. 2011. Dynamic reconfigurable multicast interconnections by using radix-4 multistage networks in FPGA. In 2011 9th IEEE International Conference on Industrial Informatics. 810--815."},{"key":"e_1_2_1_12_1","volume-title":"26th International Conference on Field Programmable Logic and Applications.","author":"Gupta P. K.","year":"2016","unstructured":"P. K. Gupta . 2016 . Accelerating datacenter workloads . In 26th International Conference on Field Programmable Logic and Applications. P. K. Gupta. 2016. Accelerating datacenter workloads. In 26th International Conference on Field Programmable Logic and Applications."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463209.2488756"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2593100"},{"key":"e_1_2_1_15_1","unstructured":"Intel. 2019. Xeon E52680 2.4Ghz Specification. https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/91754\/intel-xeon-processor-e5-2680-v4-35m-cache-2-40-ghz.html.  Intel. 2019. Xeon E52680 2.4Ghz Specification. https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/91754\/intel-xeon-processor-e5-2680-v4-35m-cache-2-40-ghz.html."},{"volume-title":"IEEE FCCM - Annual International Symposium on Field-Programmable Custom Computing Machines.","author":"Jain Abhishek Kumar","key":"e_1_2_1_16_1","unstructured":"Abhishek Kumar Jain , Xiangwei Li , Pranjul Singhai , Douglas L. Maskell , and Suhaib A. Fahmy . 2016. DeCO: A DSP block based FPGA accelerator overlay with low overhead interconnect . In IEEE FCCM - Annual International Symposium on Field-Programmable Custom Computing Machines. Abhishek Kumar Jain, Xiangwei Li, Pranjul Singhai, Douglas L. Maskell, and Suhaib A. Fahmy. 2016. DeCO: A DSP block based FPGA accelerator overlay with low overhead interconnect. In IEEE FCCM - Annual International Symposium on Field-Programmable Custom Computing Machines."},{"key":"e_1_2_1_17_1","volume-title":"IEEE FCCM - International Symposium on Field-Programmable Custom Computing Machines.","author":"Jain Abhishek Kumar","year":"2013","unstructured":"Abhishek Kumar Jain , Scott Lloyd , and Maya Gokhale . 2013 . Microscope on memory: MPSoC-enabled computer memory system assessments . In IEEE FCCM - International Symposium on Field-Programmable Custom Computing Machines. Abhishek Kumar Jain, Scott Lloyd, and Maya Gokhale. 2013. Microscope on memory: MPSoC-enabled computer memory system assessments. In IEEE FCCM - International Symposium on Field-Programmable Custom Computing Machines."},{"volume-title":"Automation Test in Europe Conference Exhibition (DATE). 1628--1633","author":"Jain A. K.","key":"e_1_2_1_18_1","unstructured":"A. K. Jain , D. L. Maskell , and S. A. Fahmy . 2016. Throughput oriented FPGA overlays using DSP blocks. In Design , Automation Test in Europe Conference Exhibition (DATE). 1628--1633 . A. K. Jain, D. L. Maskell, and S. A. Fahmy. 2016. Throughput oriented FPGA overlays using DSP blocks. In Design, Automation Test in Europe Conference Exhibition (DATE). 1628--1633."},{"key":"e_1_2_1_19_1","volume-title":"Fahmy","author":"Jain Abhishek Kumar","year":"2017","unstructured":"Abhishek Kumar Jain , Douglas L. Maskell , and Suhaib A . Fahmy . 2017 . Resource-aware just-in-time OpenCL compiler for coarse-grained FPGA overlays. arXiv preprint arXiv:1705.02730. Abhishek Kumar Jain, Douglas L. Maskell, and Suhaib A. Fahmy. 2017. Resource-aware just-in-time OpenCL compiler for coarse-grained FPGA overlays. arXiv preprint arXiv:1705.02730."},{"key":"e_1_2_1_20_1","first-page":"12","article-title":"Enhancing the area efficiency of FPGAs with hard circuits using shadow clusters. Very Large Scale Integration (VLSI) Systems","volume":"18","author":"Jamieson P. A.","year":"2010","unstructured":"P. A. Jamieson and J. Rose . 2010 . Enhancing the area efficiency of FPGAs with hard circuits using shadow clusters. Very Large Scale Integration (VLSI) Systems , IEEE Transactions on 18 , 12 . P. A. Jamieson and J. Rose. 2010. Enhancing the area efficiency of FPGAs with hard circuits using shadow clusters. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on 18, 12.","journal-title":"IEEE Transactions on"},{"key":"e_1_2_1_21_1","article-title":"ULP-SRP: Ultra low-power samsung reconfigurable processor for biomedical applications","volume":"7","author":"Kim Changmoo","year":"2014","unstructured":"Changmoo Kim and 2014 . ULP-SRP: Ultra low-power samsung reconfigurable processor for biomedical applications . ACM Transaction Reconfigurable Technology System 7 , 3, Article 22, 15 pages. Changmoo Kim and et al. 2014. ULP-SRP: Ultra low-power samsung reconfigurable processor for biomedical applications. ACM Transaction Reconfigurable Technology System 7, 3, Article 22, 15 pages.","journal-title":"ACM Transaction Reconfigurable Technology System"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2005.35"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1975.224157"},{"key":"e_1_2_1_24_1","volume-title":"Douglas L Maskell, and Suhaib A Fahmy.","author":"Li Xiangwei","year":"2018","unstructured":"Xiangwei Li , Abhishek Kumar Jain , Douglas L Maskell, and Suhaib A Fahmy. 2018 . A time-multiplexed FPGA overlay with linear interconnect. In IEEE Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE) . 1075--1080. Xiangwei Li, Abhishek Kumar Jain, Douglas L Maskell, and Suhaib A Fahmy. 2018. A time-multiplexed FPGA overlay with linear interconnect. In IEEE Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE). 1075--1080."},{"volume-title":"Eighth International Conference On Advances in Computing, Electronics and Electrical Technology (CEET).","author":"Li Xiangwei","key":"e_1_2_1_25_1","unstructured":"Xiangwei Li , Cheng Fei Phung , and Douglas L. Maskell . 2018. FPGA overlays: Hardware--based computing for the masses . Eighth International Conference On Advances in Computing, Electronics and Electrical Technology (CEET). Xiangwei Li, Cheng Fei Phung, and Douglas L. Maskell. 2018. FPGA overlays: Hardware--based computing for the masses. Eighth International Conference On Advances in Computing, Electronics and Electrical Technology (CEET)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2015.7393130"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45234-8_7"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2011.2176730"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2018.2874079"},{"key":"e_1_2_1_30_1","unstructured":"A. Pipelined. 1986. Shared resource MIMD computer by B. In Smith et al. and published in the Proceedings of the 1978 International Conference on Parallel Processing.  A. Pipelined. 1986. Shared resource MIMD computer by B. In Smith et al. and published in the Proceedings of the 1978 International Conference on Parallel Processing."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2678373.2665678"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2145694.2145708"},{"volume-title":"30th Annual International Symposium on Computer Architecture, 2003. Proceedings. IEEE, 422--433","author":"Karthikeyan","key":"e_1_2_1_33_1","unstructured":"Karthikeyan Sankaralingam and et al. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture . In 30th Annual International Symposium on Computer Architecture, 2003. Proceedings. IEEE, 422--433 . Karthikeyan Sankaralingam and et al. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In 30th Annual International Symposium on Computer Architecture, 2003. Proceedings. IEEE, 422--433."},{"key":"e_1_2_1_34_1","volume-title":"WaveScalar. In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE Computer Society, 291","author":"Swanson Steven","year":"2003","unstructured":"Steven Swanson , Ken Michelson , Andrew Schwerin , and Mark Oskin . 2003 . WaveScalar. In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE Computer Society, 291 . Steven Swanson, Ken Michelson, Andrew Schwerin, and Mark Oskin. 2003. WaveScalar. In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture. IEEE Computer Society, 291."},{"key":"e_1_2_1_35_1","volume-title":"Pyverilog: A python-based hardware design processing toolkit for verilog HDL. In Applied Reconfigurable Computing.","author":"Takamaeda-Yamazaki Shinya","year":"2015","unstructured":"Shinya Takamaeda-Yamazaki . 2015 . Pyverilog: A python-based hardware design processing toolkit for verilog HDL. In Applied Reconfigurable Computing. Shinya Takamaeda-Yamazaki. 2015. Pyverilog: A python-based hardware design processing toolkit for verilog HDL. In Applied Reconfigurable Computing."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964910"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the","author":"Thornton James E.","year":"1964","unstructured":"James E. Thornton . 1964 . Parallel operation in the control data 6600 . In Proceedings of the October 27-29, 1964, Fall Joint Computer Conference, Part II: Very High Speed Computer Systems. ACM, 33--40. James E. Thornton. 1964. Parallel operation in the control data 6600. In Proceedings of the October 27-29, 1964, Fall Joint Computer Conference, Part II: Very High Speed Computer Systems. ACM, 33--40."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/321439.321449"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings (Cat. No. 02CH37353)","volume":"2","author":"Wu Ting","year":"2002","unstructured":"Ting Wu , Chi-Ying Tsui , and Mounir Hamdi . 2002 . A 2 Gb\/s 256* 256 CMOS crossbar switch fabric core design using pipelined MUX. In 2002 IEEE International Symposium on Circuits and Systems . Proceedings (Cat. No. 02CH37353) , Vol. 2 . Ting Wu, Chi-Ying Tsui, and Mounir Hamdi. 2002. A 2 Gb\/s 256* 256 CMOS crossbar switch fabric core design using pipelined MUX. In 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No. 02CH37353), Vol. 2."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3358187","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3358187","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:32:58Z","timestamp":1750199578000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3358187"}},"subtitle":["A Fine-Grained Multithreading Overlay Framework for Modern CPU-FPGA Dataflow Applications"],"short-title":[],"issued":{"date-parts":[[2019,10,7]]},"references-count":39,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2019,10,31]]}},"alternative-id":["10.1145\/3358187"],"URL":"https:\/\/doi.org\/10.1145\/3358187","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2019,10,7]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}