{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T02:17:25Z","timestamp":1775873845379,"version":"3.50.1"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T00:00:00Z","timestamp":1699488000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001602","name":"Science Foundation Ireland","doi-asserted-by":"crossref","award":["13\/RC\/2094_P2"],"award-info":[{"award-number":["13\/RC\/2094_P2"]}],"id":[{"id":"10.13039\/501100001602","id-type":"DOI","asserted-by":"crossref"}]},{"name":"European Union\u2019s Horizon 2020 research and innovation programme","award":["754489"],"award-info":[{"award-number":["754489"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,11,30]]},"abstract":"<jats:p>Field-programmable gate array (FPGA)\u2013based accelerators are becoming increasingly popular for deep neural network (DNN) inference due to their ability to scale performance with increasing degrees of specialization with dataflow architectures or custom data type precision. In order to reduce the barrier for software engineers and data scientists to adopt FPGAs, C++- and OpenCL-based design entries with high-level synthesis (HLS) have been introduced. They provide higher abstraction compared with register-transfer level (RTL)\u2013based design. HLS offers faster development time, better maintainability, and more flexibility in code exploration when evaluating several options for multi-dimension tensors, convolutional layers, or different degrees of parallelism. For this reason, HLS has been adopted by DNN accelerator generation frameworks such as FINN and hls4ml.<\/jats:p>\n          <jats:p>In this article, we present an alternative backend library for FINN, leveraging RTL. We investigate and evaluate, across a spectrum of design dimensions, the pros and cons of an RTL-based implementation versus the original HLS variant. We show that for smaller design parameters, RTL produces significantly smaller circuits as compared with HLS. For larger circuits, however, the look-up table (LUT) count of RTL-based design is slightly higher, up to around 15%. On the other hand, HLS consistently requires more flip-flops (FFs; with an orders-of-magnitude difference for smaller designs) and block RAMs (BRAMs; 2\u00d7 more). This also impacts the critical path delay, with RTL producing significantly faster circuits, up to around 80%. RTL also benefits from at least a 10\u00d7 reduction in synthesis time. Finally, the results were validated in practice using two real-world use cases, one of a multi-layer perceptron (MLP) used in network intrusion detection and the other a convolution network called ResNet, used in image recognition. Overall, since HLS frameworks code-generate the hardware design, the benefits of the ease in the design entry is less important. As such, the gained benefits in synthesis time together with some design-dependent resource benefits make the RTL abstraction an attractive alternative.<\/jats:p>","DOI":"10.1145\/3547141","type":"journal-article","created":{"date-parts":[[2022,7,14]],"date-time":"2022-07-14T11:16:30Z","timestamp":1657797390000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["On the RTL Implementation of FINN Matrix Vector Unit"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1509-9678","authenticated-orcid":false,"given":"Syed Asad","family":"Alam","sequence":"first","affiliation":[{"name":"School of Computer Science and Statistics, Trinity College Dublin, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3782-4612","authenticated-orcid":false,"given":"David","family":"Gregg","sequence":"additional","affiliation":[{"name":"School of Computer Science and Statistics, Trinity College Dublin, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6183-5077","authenticated-orcid":false,"given":"Giulio","family":"Gambardella","sequence":"additional","affiliation":[{"name":"Synopsys Inc, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3998-7896","authenticated-orcid":false,"given":"Thomas","family":"Preusser","sequence":"additional","affiliation":[{"name":"AMD, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7833-4057","authenticated-orcid":false,"given":"Michaela","family":"Blott","sequence":"additional","affiliation":[{"name":"AMD, Ireland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,11,9]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"2010. AMBA 4 AXI4-Stream Protocol Specification."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2016.2557298"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3470567"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3242897"},{"key":"e_1_3_2_6_2","first-page":"217","volume-title":"Proceedings of the ACM International Conference on Computing Frontiers","author":"Bruschi N.","year":"2020","unstructured":"N. Bruschi, A. Garofalo, F. Conti, G. Tagliavini, and D. Rossi. 2020. Enabling mixed-precision quantized neural networks in extreme-edge devices. In Proceedings of the ACM International Conference on Computing Frontiers (Sicily, Catania, Italy, May 2020). 217\u2013220."},{"issue":"5","key":"e_1_3_2_7_2","first-page":"871","article-title":"CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices","volume":"67","author":"Capotondi A.","year":"2020","unstructured":"A. Capotondi, M. Rusci, M. Fariselli, and L. Benini. 2020. CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices. IEEE Trans. Circuits Syst. II 67, 5 (2020), 871\u2013875.","journal-title":"IEEE Trans. Circuits Syst. II"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/MDT.2009.69"},{"key":"e_1_3_2_9_2","first-page":"531","volume-title":"Proc. Int. Conf. Field-Programmable Logic Applicat.","author":"Czajkowski T. S.","year":"2012","unstructured":"T. S. Czajkowski, U. Aydonat, D. Denisenko, J. Freeman, M. Kinsner, D. Neto, J. Wong, P. Yiannacouras, and D. P. Singh. 2012. From OpenCL to high-performance hardware on FPGAs. In Proc. Int. Conf. Field-Programmable Logic Applicat.531\u2013534."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_11_2","unstructured":"Michaela Blott et al.2021. FINN: Dataflow compiler for QNN inference on FPGAs. (2021). https:\/\/github.com\/xilinx\/finn."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsta.2019.0155"},{"key":"e_1_3_2_13_2","volume-title":"Proc. IEEE Conf. Comput. Vision Pattern Recog.arXiv preprint arXiv:1512.03385","author":"He Kaiming","year":"2015","unstructured":"Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition, In Proc. IEEE Conf. Comput. Vision Pattern Recog.arXiv preprint arXiv:1512.03385. arXiv:http:\/\/arxiv.org\/abs\/1512.03385v1."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2017.8280129"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3242044"},{"key":"e_1_3_2_17_2","unstructured":"Intel\u00ae. [n.d.]. High Level Synthesis Compiler. Retrieved July 25 2022 from https:\/\/www.intel.com\/content\/www\/us\/en\/software\/programmable\/quartus-prime\/hls-compiler.html."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2018.2795611"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/MDT.2009.83"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10617-012-9096-8"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/MilCIS.2015.7348942"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2017.2671881"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2017.05.014"},{"issue":"7348013","key":"e_1_3_2_25_2","first-page":"12","article-title":"Automatic pipelining and vectorization of scientific code for FPGAs","volume":"2019","author":"Nabi S. W.","year":"2019","unstructured":"S. W. Nabi and W. Vanderbauwhede. 2019. Automatic pipelining and vectorization of scientific code for FPGAs. International Journal of Reconfigurable Computing 2019, 7348013 (2019), 12.","journal-title":"International Journal of Reconfigurable Computing"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2513673"},{"key":"e_1_3_2_27_2","first-page":"1","volume-title":"Proc. International Workshop on FPGAs for Software Programmers","author":"Noronha Daniel H.","year":"2018","unstructured":"Daniel H. Noronha, Bahar Salehpour, and Steven J. E. Wilton. 2018. LeFlow: Enabling flexible FPGA high-level synthesis of tensorflow deep neural networks. In Proc. International Workshop on FPGAs for Software Programmers. 1\u20138."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.3333552"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2012.78"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3108545"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT51103.2020.00016"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.23919\/FPL.2017.8056834"},{"key":"e_1_3_2_33_2","unstructured":"M. Rastegari V. Ordonez J. Redmon and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. 1603.05279:1603.05279v4 [cs.CV]."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/MDT.2009.84"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2017.9"},{"key":"e_1_3_2_36_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arxiv:1409.1556."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_38_2","first-page":"65","volume-title":"Proc. ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays","author":"Umoroglu Y.","year":"2017","unstructured":"Y. Umoroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proc. ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, CA). 65\u201374."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL50879.2020.00055"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3337929"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2013.6718388"},{"key":"e_1_3_2_42_2","unstructured":"Xilinx. Xilinx Unified Software Development Flatform. Retrieved July 25 2022 from https:\/\/www.xilinx.com\/html_docs\/xilinx2020_1\/vitis_doc\/irn1582730075765.html."},{"key":"e_1_3_2_43_2","unstructured":"Xilinx. 2020. https:\/\/www.xilinx.com\/support\/documentation\/sw_manuals\/xilinx2020_1\/ug892-vivado-design-flows-overview.pdf."},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","unstructured":"Tien-Ju Yang Yu-Hsin Chen and Vivienne Sze. 2016. Designing energy-efficient convolutional neural networks using energy-aware pruning. arxiv:1611.05128.","DOI":"10.1109\/CVPR.2017.643"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2019.8714724"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3547141","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3547141","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:55Z","timestamp":1750186975000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3547141"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,9]]},"references-count":44,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,11,30]]}},"alternative-id":["10.1145\/3547141"],"URL":"https:\/\/doi.org\/10.1145\/3547141","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,9]]},"assertion":[{"value":"2021-12-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-07-02","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}