{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T17:09:51Z","timestamp":1774631391328,"version":"3.50.1"},"reference-count":21,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,4,22]],"date-time":"2016-04-22T00:00:00Z","timestamp":1461283200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGARCH Comput. Archit. News"],"published-print":{"date-parts":[[2016,4,22]]},"abstract":"<jats:p>Coarse-grained overlay architectures have been shown to be effective when paired with general purpose processors, offering software-like programmability, fast compilation, and improved design productivity. These architectures enable general purpose hardware accelerators, allowing hardware design at a higher level of abstraction, but at the cost of area and performance overheads. This paper examines the DySER overlay architecture as a hardware accelerator paired with a general purpose processor in a hybrid FPGA such as the Xilinx Zynq. We evaluate the DySER architecture mapped on the Xilinx Zynq and show that it suffers from a significant area and performance overhead. We then propose an improved functional unit architecture using the flexibility of the DSP48E1 primitive which results in a 2.5 times frequency improvement and 25% area reduction compared to the original functional unit architecture. We demonstrate that this improvement results in the routing architecture becoming the bottleneck in performance.<\/jats:p>","DOI":"10.1145\/2927964.2927970","type":"journal-article","created":{"date-parts":[[2016,4,25]],"date-time":"2016-04-25T19:51:13Z","timestamp":1461613873000},"page":"28-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Adapting the DySER Architecture with DSP Blocks as an Overlay for the Xilinx Zynq"],"prefix":"10.1145","volume":"43","author":[{"given":"Abhishek Kumar","family":"Jain","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore"}]},{"given":"Xiangwei","family":"Li","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}]},{"given":"Suhaib A.","family":"Fahmy","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}]},{"given":"Douglas L.","family":"Maskell","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2016,4,22]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"(2013) Zynq-7000 technical reference manual. Xilinx Ltd. {Online}. Available:http:\/\/www.xilinx.com\/support\/documentation\/user guides\/ug585-Zynq-7000-TRM.pdf  (2013) Zynq-7000 technical reference manual. Xilinx Ltd. {Online}. Available:http:\/\/www.xilinx.com\/support\/documentation\/user guides\/ug585-Zynq-7000-TRM.pdf"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11265-014-0884-1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2435227.2435259"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/LES.2014.2314390"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/LES.2011.2167713"},{"key":"e_1_2_1_6_1","first-page":"1","volume-title":"A high-performance overlay architecture for pipelined execution of data flow graphs,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL)","author":"Capalija D.","year":"2013","unstructured":"D. Capalija and T. S. Abdelrahman , \" A high-performance overlay architecture for pipelined execution of data flow graphs,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL) , 2013 , pp. 1 -- 8 . D. Capalija and T. S. Abdelrahman, \"A high-performance overlay architecture for pipelined execution of data flow graphs,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2013, pp. 1--8."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2012.6168949"},{"key":"e_1_2_1_8_1","volume-title":"Efficient mapping of mathematical expressions into DSP blocks,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL)","author":"Ronak B.","year":"2014","unstructured":"B. Ronak and S. A. Fahmy , \" Efficient mapping of mathematical expressions into DSP blocks,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL) , 2014 . B. Ronak and S. A. Fahmy, \"Efficient mapping of mathematical expressions into DSP blocks,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2014."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2629443"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2013.21"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSD.2012.111"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2012.51"},{"key":"e_1_2_1_13_1","first-page":"503","volume-title":"Dynamically specialized datapaths for energy efficient computing,\" in International Symposium on High Performance Computer Architecture (HPCA)","author":"Govindaraju V.","year":"2011","unstructured":"V. Govindaraju , C.-H. Ho , and K. Sankaralingam , \" Dynamically specialized datapaths for energy efficient computing,\" in International Symposium on High Performance Computer Architecture (HPCA) , 2011 , pp. 503 -- 514 . V. Govindaraju, C.-H. Ho, and K. Sankaralingam, \"Dynamically specialized datapaths for energy efficient computing,\" in International Symposium on High Performance Computer Architecture (HPCA), 2011, pp. 503--514."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2002.997877"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1233307.1233308"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2004.65"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2015.15"},{"key":"e_1_2_1_18_1","volume-title":"Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation,\" Energy (mJ)","author":"Ho C.-H.","unstructured":"C.-H. Ho , V. Govindaraju , T. Nowatzki , Z. Marzec , P. Agarwal , C. Frericks , R. Cofell , J. Benson , and K. Sankaralingam , \" Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation,\" Energy (mJ) , vol. 5 , no. 10, p. 15. C.-H. Ho, V. Govindaraju, T. Nowatzki, Z. Marzec, P. Agarwal, C. Frericks, R. Cofell, J. Benson, and K. Sankaralingam, \"Performance evaluation of a DySER FPGA prototype system spanning the compiler, microarchitecture, and hardware implementation,\" Energy (mJ), vol. 5, no. 10, p. 15."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2380403.2380427"},{"key":"e_1_2_1_20_1","first-page":"400","volume-title":"An area-efficient partially reconfigurable crossbar switch with low reconfiguration delay,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL)","author":"Hoo C. H.","year":"2012","unstructured":"C. H. Hoo and A. Kumar , \" An area-efficient partially reconfigurable crossbar switch with low reconfiguration delay,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL) , 2012 , pp. 400 -- 406 . C. H. Hoo and A. Kumar, \"An area-efficient partially reconfigurable crossbar switch with low reconfiguration delay,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2012, pp. 400--406."},{"key":"e_1_2_1_21_1","first-page":"1","volume-title":"Efficient implementation of virtual coarse grained reconfigurable arrays on FPGAs,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL)","author":"Heyse K.","year":"2013","unstructured":"K. Heyse , T. Davidson , E. Vansteenkiste , K. Bruneel , and D. Stroobandt , \" Efficient implementation of virtual coarse grained reconfigurable arrays on FPGAs,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL) , 2013 , pp. 1 -- 8 . K. Heyse, T. Davidson, E. Vansteenkiste, K. Bruneel, and D. Stroobandt, \"Efficient implementation of virtual coarse grained reconfigurable arrays on FPGAs,\" in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2013, pp. 1--8."}],"container-title":["ACM SIGARCH Computer Architecture News"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2927964.2927970","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2927964.2927970","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:56:21Z","timestamp":1750222581000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2927964.2927970"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,4,22]]},"references-count":21,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,4,22]]}},"alternative-id":["10.1145\/2927964.2927970"],"URL":"https:\/\/doi.org\/10.1145\/2927964.2927970","relation":{},"ISSN":["0163-5964"],"issn-type":[{"value":"0163-5964","type":"print"}],"subject":[],"published":{"date-parts":[[2016,4,22]]},"assertion":[{"value":"2016-04-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}