{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T16:25:07Z","timestamp":1774628707296,"version":"3.50.1"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2020,6,3]],"date-time":"2020-06-03T00:00:00Z","timestamp":1591142400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2020,9,30]]},"abstract":"<jats:p>Reducing the precision of deep neural network (DNN) inference accelerators can yield large efficiency gains with little or no accuracy degradation compared to half or single precision floating-point by enabling more multiplication operations per unit area. A wide range of precisions fall on the pareto-optimal curve of hardware efficiency vs. accuracy with no single precision dominating, making the variable precision capabilities of FPGAs very valuable. We propose three types of logic block architectural enhancements and fully evaluate a total of six architectures that improve the area efficiency of multiplications and additions implemented in the soft fabric. Increasing the LUT fracturability and adding two adders to the ALM (4-bit Adder Double Chain architecture) leads to a 1.5\u00d7 area reduction for arithmetic heavy machine learning (ML) kernels, while increasing their speed. In addition, this architecture also reduces the logic area of general applications by 6%, while increasing the critical path delay by only 1%. However, our highest impact option, which adds a 9-bit shadow multiplier to the logic clusters, reduces the area and critical path delay of ML kernels by 2.4\u00d7 and 1.2\u00d7, respectively. These large gains come at a cost of 15% logic area increase for general applications.<\/jats:p>","DOI":"10.1145\/3393668","type":"journal-article","created":{"date-parts":[[2020,6,3]],"date-time":"2020-06-03T16:06:02Z","timestamp":1591200362000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":35,"title":["FPGA Logic Block Architectures for Efficient Deep Learning Inference"],"prefix":"10.1145","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4157-8584","authenticated-orcid":false,"given":"Mohamed","family":"Eldafrawy","sequence":"first","affiliation":[{"name":"University of Toronto, Toronto, Ontario, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8044-1644","authenticated-orcid":false,"given":"Andrew","family":"Boutros","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Ontario, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1044-4460","authenticated-orcid":false,"given":"Sadegh","family":"Yazdanshenas","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Ontario, Canada"}]},{"given":"Vaughn","family":"Betz","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, Ontario, Canada"}]}],"member":"320","published-online":{"date-parts":[[2020,6,3]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Y. Cao. 2018. Predictive Technology Model (PTM). Retrieved from http:\/\/ptm.asu.edu\/.  Y. Cao. 2018. Predictive Technology Model (PTM). Retrieved from http:\/\/ptm.asu.edu\/."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2004.824300"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-C.1973.223648"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the Custom Integrated Circuits Conference. 551--554","author":"Betz V."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"V. Betz and J. Rose. 1998. How much logic should go in an FPGA logic block. IEEE Design 8 Test of Computers 15 1 (1998) 10--15.  V. Betz and J. Rose. 1998. How much logic should go in an FPGA logic block. IEEE Design 8 Test of Computers 15 1 (1998) 10--15.","DOI":"10.1109\/54.655177"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications. 1--8.","author":"\u00a0al A. Boutros","year":"2018"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242898"},{"key":"e_1_2_1_8_1","volume-title":"FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 94--103","author":"\u00a0al A. Boutros","year":"2019"},{"key":"e_1_2_1_9_1","volume-title":"Conference workshop: FPGAs in 2032","author":"Burich M.","year":"2012"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays. 108--116","author":"\u00a0al S. Chandrakar","year":"2015"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the International Conference on Field-Programmable Technology. 34--41","author":"Chiasson C."},{"key":"e_1_2_1_12_1","unstructured":"Intel Corporation. 2005. Stratix GX Transeiver User Guide.  Intel Corporation. 2005. Stratix GX Transeiver User Guide."},{"key":"e_1_2_1_13_1","unstructured":"Xilinx Corporation. 2007. Virtex-II Platform FPGA User Guide.  Xilinx Corporation. 2007. Virtex-II Platform FPGA User Guide."},{"key":"e_1_2_1_14_1","unstructured":"Xilinx Corporation. 2007. Virtex-II Pro and Virtex-II Pro X FPGA User Guide.  Xilinx Corporation. 2007. Virtex-II Pro and Virtex-II Pro X FPGA User Guide."},{"key":"e_1_2_1_15_1","unstructured":"M. Deo et\u00a0al. 2019. Intel Stratix 10 MX devices solve the memory bandwidth challenge. Intel Whitepaper.  M. Deo et\u00a0al. 2019. Intel Stratix 10 MX devices solve the memory bandwidth challenge. Intel Whitepaper."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the International Symposium on Computer Architecture, 1--14","author":"\u00a0al J. Fowers","year":"2018"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the International Symposium on Computer Architecture. 243--254","author":"\u00a0al S. Han","year":"2016"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"\u00a0al K. He","year":"2016"},{"key":"e_1_2_1_19_1","unstructured":"Intel Corporation. 2017. Intel Stratix 10 logic array blocks and adaptive logic modules user guide (UG-S10LAB).  Intel Corporation. 2017. Intel Stratix 10 logic array blocks and adaptive logic modules user guide (UG-S10LAB)."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the International Conference on Field Programmable Technology. 1--8.","author":"Jamieson P."},{"key":"e_1_2_1_21_1","unstructured":"A. Krizhevsky et\u00a0al. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.  A. Krizhevsky et\u00a0al. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2009.2031318"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays. 202--211","author":"\u00a0al M. Langhammer","year":"2019"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the International Symposium on Field Programmable Gate Arrays. 117--125","author":"Langhammer M."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/360276.360299"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays. 14--20","author":"\u00a0al D. Lewis","year":"2005"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the International Symposium on Field Programmable Gate Arrays. 159--168","author":"\u00a0al D. Lewis","year":"2016"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the International Symposium on Field Programmable Gate Arrays. 12--20","author":"\u00a0al D. M.","year":"2003"},{"key":"e_1_2_1_29_1","first-page":"1","article-title":"VTR 7.0: Next generation architecture and CAD system for FPGAs","volume":"7","author":"\u00a0al J. Luu","year":"2014","journal-title":"ACM Trans. Reconfig. Technol. Syst."},{"key":"e_1_2_1_30_1","volume-title":"WRPN: Wide reduced-precision networks. arXiv preprint arXiv:1709.01134.","author":"\u00a0al A. Mishra","year":"2017"},{"key":"e_1_2_1_31_1","article-title":"VTR 8: High performance CAD and customizable FPGA architecture modelling","author":"\u00a0al Kevin E.","year":"2020","journal-title":"ACM Trans. Reconfig. Technol. Syst. 0, ja, 1"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the International Conference on Field-Programmable Technology. 77--84","author":"\u00a0al E. Nurvitadhi","year":"2016"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays. 5--14","author":"\u00a0al E. Nurvitadhi","year":"2017"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the International Symposium on Field-Programmable Gate Arrays. 26--35","author":"\u00a0al J.","year":"2016"},{"key":"e_1_2_1_35_1","unstructured":"E. Real et\u00a0al. 2018. Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548.  E. Real et\u00a0al. 2018. Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548."},{"key":"e_1_2_1_36_1","first-page":"1013","article-title":"Architecture of field-programmable gate arrays","volume":"81","author":"\u00a0al J. Rose","year":"1993","journal-title":"IEEE J. Solid-State Circ."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications. 1--8.","author":"\u00a0al V. Rybalkin","year":"2018"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2392104"},{"key":"e_1_2_1_39_1","volume-title":"et\u00a0al","author":"Mike","year":"2019"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41928-018-0059-3"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the International Conference on Field Programmable Technology. 9--16","author":"Yazdanshenas S."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3301298"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3393668","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3393668","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:17Z","timestamp":1750200077000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3393668"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,3]]},"references-count":42,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,9,30]]}},"alternative-id":["10.1145\/3393668"],"URL":"https:\/\/doi.org\/10.1145\/3393668","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,3]]},"assertion":[{"value":"2019-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-06-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}