{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:49:27Z","timestamp":1750308567422,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2019,3,28]],"date-time":"2019-03-28T00:00:00Z","timestamp":1553731200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2019,6,30]]},"abstract":"<jats:p>Convolutional Neural Network (ConvNet or CNN) algorithms are characterized by a large number of model parameters and high computational complexity. These two requirements have made it challenging for implementations on resource-limited FPGAs. The challenges are magnified when considering designs for low-end FPGAs. While previous work has demonstrated successful ConvNet implementations with high-end FPGAs, this article presents a ConvNet accelerator design that enables the implementation of complex deep ConvNet architectures on resource-constrained FPGA platforms aimed at the IoT market. We call the design \u201cFeatherNet\u201d for its light resource utilization. The implementations are VHDL-based providing flexibility in design optimizations. As part of the design process, new methods are introduced to address several design challenges. The first method is a novel stride-aware graph-based method targeted at ConvNets that aims at achieving efficient signal processing with reduced resource utilization. The second method addresses the challenge of determining the minimal precision arithmetic needed while preserving high accuracy. For this challenge, we propose variable-width dynamic fixed-point representations combined with a layer-by-layer design-space pruning heuristic across the different layers of the deep ConvNet model. The third method aims at achieving a modular design that can support different types of ConvNet layers while ensuring low resource utilization. For this challenge, we propose the modules to be relatively small and composed of computational filters that can be interconnected to build an entire accelerator design. These model elements can be easily configured through HDL parameters (e.g., layer type, mask size, stride, etc.) to meet the needs of specific ConvNet implementations and thus they can be reused to implement a wide variety of ConvNet architectures. The fourth method addresses the challenge of design portability between two different FPGA vendor platforms, namely, Intel\/Altera and Xilinx. For this challenge, we propose to instantiate the device-specific hardware blocks needed in each computational filter, rather than relying on the synthesis tools to infer these blocks, while keeping track of the similarities and differences between the two platforms. We believe that the solutions to these design challenges further advance knowledge as they can benefit designers and other researchers using similar devices or facing similar challenges. Our results demonstrated the success of addressing the design challenges and achieving low (30%) resource utilization for the low-end FPGA platforms: Zedboard and Cyclone V. The design overcame the limitation of designs targeted for high-end platforms and that cannot fit on low-end IoT platforms. Furthermore, our design showed superior performance results (measured in terms of [Frame\/s\/W] per Dollar) compared to high-end optimized designs.<\/jats:p>","DOI":"10.1145\/3306202","type":"journal-article","created":{"date-parts":[[2019,3,28]],"date-time":"2019-03-28T12:23:24Z","timestamp":1553775804000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["FeatherNet"],"prefix":"10.1145","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1280-9291","authenticated-orcid":false,"given":"Raghid","family":"Morcel","sequence":"first","affiliation":[{"name":"American University of Beirut, Beirut, Lebanon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hazem","family":"Hajj","sequence":"additional","affiliation":[{"name":"American University of Beirut, Beirut, Lebanon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mazen A. R.","family":"Saghir","sequence":"additional","affiliation":[{"name":"American University of Beirut, Beirut, Lebanon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haitham","family":"Akkary","sequence":"additional","affiliation":[{"name":"American University of Beirut, Beirut, Lebanon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hassan","family":"Artail","sequence":"additional","affiliation":[{"name":"American University of Beirut, Beirut, Lebanon"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rahul","family":"Khanna","sequence":"additional","affiliation":[{"name":"Intel Corporation, Hillsboro, Oregon, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anil","family":"Keshavamurthy","sequence":"additional","affiliation":[{"name":"Intel Corporation, Hillsboro, Oregon, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,3,28]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"4","article-title":"Tactics to directly map CNN graphs on embedded FPGAs","volume":"9","author":"Abdelouahab Kamel","year":"2017","unstructured":"Kamel Abdelouahab , Maxime Pelcat , Jocelyn Serot , Cedric Bourrasset , and Francois Berry . 2017 . Tactics to directly map CNN graphs on embedded FPGAs . IEEE Embed. Syst. Lett. 9 , 4 (Dec. 2017), 113--116. Kamel Abdelouahab, Maxime Pelcat, Jocelyn Serot, Cedric Bourrasset, and Francois Berry. 2017. Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed. Syst. Lett. 9, 4 (Dec. 2017), 113--116.","journal-title":"IEEE Embed. Syst. Lett."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195664"},{"key":"e_1_2_1_3_1","unstructured":"Avnet. 2017. ZedBoard. Retrieved from http:\/\/zedboard.org\/product\/zedboard.  Avnet. 2017. ZedBoard. Retrieved from http:\/\/zedboard.org\/product\/zedboard."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021738"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/VLSIT.2014.6894411"},{"key":"e_1_2_1_6_1","unstructured":"BVLC. 2001. Model Zoo. Retrieved from http:\/\/ccrma.stanford.edu\/&sim;jos\/bayes\/bayes.html.  BVLC. 2001. Model Zoo. Retrieved from http:\/\/ccrma.stanford.edu\/&sim;jos\/bayes\/bayes.html."},{"key":"e_1_2_1_7_1","unstructured":"Matthieu Courbariaux Itay Hubara Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or . Retrieved from http:\/\/arxiv.org\/abs\/1602.02830.  Matthieu Courbariaux Itay Hubara Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or . Retrieved from http:\/\/arxiv.org\/abs\/1602.02830."},{"key":"e_1_2_1_8_1","unstructured":"Alpha Data. 2017. An Open Source FPGA CNN Library. Retrieved from ftp:\/\/ftp.alpha-data.com\/pub\/appnotes\/cnn\/ad-an-0055_v1_0.pdf.  Alpha Data. 2017. An Open Source FPGA CNN Library. Retrieved from ftp:\/\/ftp.alpha-data.com\/pub\/appnotes\/cnn\/ad-an-0055_v1_0.pdf."},{"key":"e_1_2_1_9_1","volume-title":"Mersereau","author":"Dudgeon Dan E.","year":"1983","unstructured":"Dan E. Dudgeon and Russell M . Mersereau . 1983 . Multidimensional Digital Signal Processing. Prentice-Hall , Englewood Cliffs, NJ. Dan E. Dudgeon and Russell M. Mersereau. 1983. Multidimensional Digital Signal Processing. Prentice-Hall, Englewood Cliffs, NJ."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2010.5537908"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2011.5981829"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2009.5272559"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML\u201915)","author":"Gupta Suyog","year":"2015","unstructured":"Suyog Gupta , Ankur Agrawal , Kailash Gopalakrishnan , and Pritish Narayanan . 2015 . Deep learning with limited numerical precision . In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML\u201915) . JMLR.org, 1737--1746. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals;3045118.3045303. Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML\u201915). JMLR.org, 1737--1746. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals;3045118.3045303."},{"key":"e_1_2_1_14_1","volume-title":"Ristretto: Hardware-oriented approximation of convolutional neural networks. Master\u2019s thesis","author":"Gysel Philipp M.","year":"2016","unstructured":"Philipp M. Gysel . 2016 . Ristretto: Hardware-oriented approximation of convolutional neural networks. Master\u2019s thesis . University of California , Davis, Davis, CA . Philipp M. Gysel. 2016. Ristretto: Hardware-oriented approximation of convolutional neural networks. Master\u2019s thesis. University of California, Davis, Davis, CA."},{"key":"e_1_2_1_15_1","volume-title":"Dally","author":"Han Song","year":"2015","unstructured":"Song Han , Huizi Mao , and William J . Dally . 2015 . Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Retrieved from http:\/\/arxiv.org\/abs\/1510.00149. Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Retrieved from http:\/\/arxiv.org\/abs\/1510.00149."},{"key":"e_1_2_1_16_1","volume-title":"Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation","author":"Hauck Scott","year":"2007","unstructured":"Scott Hauck and Andre DeHon . 2007 . Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation . Morgan Kaufmann Publishers Inc ., San Francisco, CA. Scott Hauck and Andre DeHon. 2007. Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation. Morgan Kaufmann Publishers Inc., San Francisco, CA."},{"key":"e_1_2_1_17_1","first-page":"1","article-title":"Quantized neural networks: Training neural networks with low precision weights and activations","volume":"18","author":"Hubara Itay","year":"2017","unstructured":"Itay Hubara , Matthieu Courbariaux , Daniel Soudry , Ran El-Yaniv , and Yoshua Bengio . 2017 . Quantized neural networks: Training neural networks with low precision weights and activations . J. Mach. Learn. Res. 18 , 1 (Jan. 2017), 6869--6898. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals;3122009.3242044. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18, 1 (Jan. 2017), 6869--6898. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals;3122009.3242044.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_18_1","unstructured":"Intel. 2014. Intel SDK for OpenCL Applications. Retrieved from https:\/\/software.intel.com\/en-us\/intel-opencl.  Intel. 2014. Intel SDK for OpenCL Applications. Retrieved from https:\/\/software.intel.com\/en-us\/intel-opencl."},{"key":"e_1_2_1_19_1","unstructured":"Intel\/Altera. 2017. Cyclone V. Retrieved from https:\/\/www.altera.com\/products\/fpga\/cyclone-series\/cyclone-v\/overview.html.  Intel\/Altera. 2017. Cyclone V. Retrieved from https:\/\/www.altera.com\/products\/fpga\/cyclone-series\/cyclone-v\/overview.html."},{"key":"e_1_2_1_20_1","unstructured":"Intel\/Altera. 2017. Cyclone V-GX FPGA Development Kit. Retrieved from https:\/\/www.altera.com\/products\/boards_and_kits\/dev-kits\/altera\/kit-cyclone-v-gx.html.  Intel\/Altera. 2017. Cyclone V-GX FPGA Development Kit. Retrieved from https:\/\/www.altera.com\/products\/boards_and_kits\/dev-kits\/altera\/kit-cyclone-v-gx.html."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2627369.2631644"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_1_24_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 1097--1105. Retrieved from http:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 1097--1105. Retrieved from http:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_2_1_26_1","unstructured":"Fengfu Li and Bin Liu. 2016. Ternary weight networks. Retrieved from http:\/\/arxiv.org\/abs\/1605.04711.  Fengfu Li and Bin Liu. 2016. Ternary weight networks. Retrieved from http:\/\/arxiv.org\/abs\/1605.04711."},{"volume-title":"Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916)","author":"Lin Darryl D.","key":"e_1_2_1_27_1","unstructured":"Darryl D. Lin , Sachin S. Talathi , and V. Sreekanth Annapureddy . 2016. Fixed point quantization of deep convolutional networks . In Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916) . JMLR.org, 2849--2858. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals;3045390.3045690. Darryl D. Lin, Sachin S. Talathi, and V. Sreekanth Annapureddy. 2016. Fixed point quantization of deep convolutional networks. In Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916). JMLR.org, 2849--2858. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id&equals;3045390.3045690."},{"key":"e_1_2_1_28_1","volume-title":"Buck","author":"Oppenheim Alan V.","year":"1999","unstructured":"Alan V. Oppenheim , Ronald W. Schafer , and John R . Buck . 1999 . Discrete-time Signal Processing (2nd Ed.). Prentice-Hall , Inc., Upper Saddle River, NJ. Alan V. Oppenheim, Ronald W. Schafer, and John R. Buck. 1999. Discrete-time Signal Processing (2nd Ed.). Prentice-Hall, Inc., Upper Saddle River, NJ."},{"volume-title":"VLSI Digital Signal Processing Systems Design and Implementation","author":"Parhi Keshab K.","key":"e_1_2_1_29_1","unstructured":"Keshab K. Parhi . 1999. VLSI Digital Signal Processing Systems Design and Implementation . Wiley 8 Songs, Inc., New York, NY. Keshab K. Parhi. 1999. VLSI Digital Signal Processing Systems Design and Implementation. Wiley 8 Songs, Inc., New York, NY."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CAC.2017.8243585"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195659"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080221"},{"key":"e_1_2_1_35_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from http:\/\/arxiv.org\/abs\/1409.1556.  Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from http:\/\/arxiv.org\/abs\/1409.1556."},{"key":"e_1_2_1_36_1","unstructured":"Hemendra Singh. 2018. How Much Does it Cost to Develop an IoT Application? Retrieved from http:\/\/customerthink.com\/how-much-does-it-cost-to-develop-an-iot-application\/.  Hemendra Singh. 2018. How Much Does it Cost to Develop an IoT Application? Retrieved from http:\/\/customerthink.com\/how-much-does-it-cost-to-develop-an-iot-application\/."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the Deep Learning and Unsupervised Feature Learning NIPS Workshop","volume":"1","author":"Vanhoucke Vincent","unstructured":"Vincent Vanhoucke , Andrew Senior , and Mark Z. Mao . 2011. Improving the speed of neural networks on CPUs . In Proceedings of the Deep Learning and Unsupervised Feature Learning NIPS Workshop , Vol. 1 . Citeseer, 4. Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning NIPS Workshop, Vol. 1. Citeseer, 4."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_2_1_40_1","unstructured":"Xilinx. 2017. Zynq-7000: All Programmable SoC with Hardware and Software Programmability. Retrieved from https:\/\/www.xilinx.com\/products\/silicon-devices\/soc\/zynq-7000.html.  Xilinx. 2017. Zynq-7000: All Programmable SoC with Hardware and Software Programmability. Retrieved from https:\/\/www.xilinx.com\/products\/silicon-devices\/soc\/zynq-7000.html."},{"key":"e_1_2_1_41_1","unstructured":"Xilinx. 2018. Vivado High-Level Synthesis. Retrieved from https:\/\/www.xilinx.com\/products\/design-tools\/vivado\/integration\/esl-design.html.  Xilinx. 2018. Vivado High-Level Synthesis. Retrieved from https:\/\/www.xilinx.com\/products\/design-tools\/vivado\/integration\/esl-design.html."},{"volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201914)","author":"Matthew","key":"e_1_2_1_42_1","unstructured":"Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks . In Proceedings of the European Conference on Computer Vision (ECCV\u201914) . Springer, Springer International Publishing, 818--833. Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV\u201914). Springer, Springer International Publishing, 818--833."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_2_1_44_1","volume-title":"Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients.","author":"Zhou Shuchang","year":"2016","unstructured":"Shuchang Zhou , Yuxin Wu , Zekun Ni , Xinyu Zhou , He Wen , and Yuheng Zou . 2016 . Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from http:\/\/arxiv.org\/abs\/1606.06160. Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from http:\/\/arxiv.org\/abs\/1606.06160."}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3306202","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3306202","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:04:21Z","timestamp":1750273461000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3306202"}},"subtitle":["An Accelerated Convolutional Neural Network Design for Resource-constrained FPGAs"],"short-title":[],"issued":{"date-parts":[[2019,3,28]]},"references-count":44,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,6,30]]}},"alternative-id":["10.1145\/3306202"],"URL":"https:\/\/doi.org\/10.1145\/3306202","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2019,3,28]]},"assertion":[{"value":"2017-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-03-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}