{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T11:52:13Z","timestamp":1779364333184,"version":"3.53.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2018,4,30]],"date-time":"2018-04-30T00:00:00Z","timestamp":1525046400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Singapore MOE Tier-2","award":["MOE2015-T2-2-013"],"award-info":[{"award-number":["MOE2015-T2-2-013"]}]},{"name":"Cisco Research Center","award":["CG#594589"],"award-info":[{"award-number":["CG#594589"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Emerg. Technol. Comput. Syst."],"published-print":{"date-parts":[[2018,4,30]]},"abstract":"<jats:p>FPGA-based hardware accelerators for convolutional neural networks (CNNs) have received attention due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this article, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy efficiency when a CNN is trained with binary constraints on weights and activations. Specifically, we propose an optimized fully mapped FPGA accelerator architecture tailored for bitwise convolution and normalization that features massive spatial parallelism with deep pipelines stages. A key advantage of the FPGA accelerator is that its performance is insensitive to data batch size, while the performance of GPU acceleration varies largely depending on the batch size of the data. Experiment results show that the proposed accelerator architecture for binary CNNs running on a Virtex-7 FPGA is 8.3\u00d7 faster and 75\u00d7 more energy-efficient than a Titan X GPU for processing online individual requests in small batch sizes. For processing static data in large batch sizes, the proposed solution is on a par with a Titan X GPU in terms of throughput while delivering 9.5\u00d7 higher energy efficiency.<\/jats:p>","DOI":"10.1145\/3154839","type":"journal-article","created":{"date-parts":[[2018,7,26]],"date-time":"2018-07-26T11:58:04Z","timestamp":1532606284000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":49,"title":["A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks"],"prefix":"10.1145","volume":"14","author":[{"given":"Yixing","family":"Li","sequence":"first","affiliation":[{"name":"Arizona State University, AZ, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zichuan","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kai","family":"Xu","sequence":"additional","affiliation":[{"name":"Arizona State University, AZ, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hao","family":"Yu","sequence":"additional","affiliation":[{"name":"Southern University of Science and Technology"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fengbo","family":"Ren","sequence":"additional","affiliation":[{"name":"Arizona State University, AZ, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2018,7,25]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"},{"key":"e_1_2_1_2_1","unstructured":"A. Krizhevsky I. Sutskever and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.   A. Krizhevsky I. Sutskever and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105."},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition 2011 Workshops. 109--116","author":"Farabet C.","unstructured":"C. Farabet , B. Martini , B. Corda , P. Akselrod , E. Culurciello , and Y. LeCun . 2011. Neuflow: A runtime reconfigurable dataflow processor for vision . In Proceedings of the Conference on Computer Vision and Pattern Recognition 2011 Workshops. 109--116 . C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. 2011. Neuflow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the Conference on Computer Vision and Pattern Recognition 2011 Workshops. 109--116."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847276"},{"key":"e_1_2_1_5_1","volume-title":"Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. 3123--3131.","author":"Courbariaux M.","year":"2015","unstructured":"M. Courbariaux , Y. Bengio , and J. P. David . 2015 . Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. 3123--3131. M. Courbariaux, Y. Bengio, and J. P. David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. 3123--3131."},{"key":"e_1_2_1_6_1","unstructured":"W. Sung S. Shin and K. Hwang. 2015. Resiliency of deep neural networks under quantization. arXiv:1511.06488.  W. Sung S. Shin and K. Hwang. 2015. Resiliency of deep neural networks under quantization. arXiv:1511.06488."},{"key":"e_1_2_1_7_1","unstructured":"Z. Cheng D. Soudry Z. Mao and Z. Lan. 2015. Training binary multilayer neural networks for image classification using expectation backpropagation. arXiv:1503.03562.  Z. Cheng D. Soudry Z. Mao and Z. Lan. 2015. Training binary multilayer neural networks for image classification using expectation backpropagation. arXiv:1503.03562."},{"key":"e_1_2_1_8_1","unstructured":"M. Kim and P. Smaragdis. 2016. Bitwise neural networks. arXiv:1601.06071.  M. Kim and P. Smaragdis. 2016. Bitwise neural networks. arXiv:1601.06071."},{"key":"e_1_2_1_9_1","volume-title":"Binarynet: Training deep neural networks with weights and activations constrained to +1 or &minus;1. arXiv:1602.02830.","author":"Courbariaux M.","year":"2016","unstructured":"M. Courbariaux and Y. Bengio . 2016 . Binarynet: Training deep neural networks with weights and activations constrained to +1 or &minus;1. arXiv:1602.02830. M. Courbariaux and Y. Bengio. 2016. Binarynet: Training deep neural networks with weights and activations constrained to +1 or &minus;1. arXiv:1602.02830."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the European Conference on Computer Vision. 525--542","author":"Rastegari M.","unstructured":"M. Rastegari , V. Ordonez , J. Redmon , and A. Farhadi . 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks . In Proceedings of the European Conference on Computer Vision. 525--542 . M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525--542."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning.","author":"Ioffe S.","unstructured":"S. Ioffe and C. Szegedy . 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift . In Proceedings of the 32nd International Conference on Machine Learning. S. Ioffe and C. Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847265"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Y. LeCun Y. Bengio and G. Hinton. 2015. Deep learning. Nature 521 7553 436--444.  Y. LeCun Y. Bengio and G. Hinton. 2015. Deep learning. Nature 521 7553 436--444.","DOI":"10.1038\/nature14539"},{"key":"e_1_2_1_14_1","unstructured":"I. Goodfellow Y. Bengio and A. Courville. 2016. Deep Learning. MIT Press.   I. Goodfellow Y. Bengio and A. Courville. 2016. Deep Learning. MIT Press."},{"key":"e_1_2_1_15_1","unstructured":"A. Krizhevsky I. Sutskever and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.   A. Krizhevsky I. Sutskever and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 32nd International Conference on Learning Representations.","author":"Simonyan K.","unstructured":"K. Simonyan and A. Zisserman . 2015. Very deep convolutional networks for large-scale image recognition . In Proceedings of the 32nd International Conference on Learning Representations. K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 32nd International Conference on Learning Representations."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3005348"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2625--2631","author":"Tang W.","unstructured":"W. Tang , G. Hua , and L. Wang . 2017. How to train a compact binary neural network with high accuracy? In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2625--2631 . W. Tang, G. Hua, and L. Wang. 2017. How to train a compact binary neural network with high accuracy? In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2625--2631."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007192"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021741"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021736"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021727"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021698"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the Hot Chips Conference. 28","author":"Ouyang J.","unstructured":"J. Ouyang , S. Lin , W. Qi , Y. Wang , B. Yu , and S. Jiang . 2016. SDA: Software-defined accelerator for large-scale DNN systems . In Proceedings of the Hot Chips Conference. 28 . J. Ouyang, S. Lin, W. Qi, Y. Wang, B. Yu, and S. Jiang. 2016. SDA: Software-defined accelerator for large-scale DNN systems. In Proceedings of the Hot Chips Conference. 28."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915)","author":"Han S.","unstructured":"S. Han , J. Pool , J. Tran , and W. Dally . 2015. Learning both weights and connections for efficient neural network . In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915) . 1135--1143. S. Han, J. Pool, J. Tran, and W. Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS\u201915). 1135--1143."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.357"},{"key":"e_1_2_1_28_1","unstructured":"N. P. Jouppi C. Young N. Patil D. Patterson G. Agrawal R. Bajwa S. Bates S. Bhatia N. Boden A. Borchers and R. Boyle. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv:1704.04760.  N. P. Jouppi C. Young N. Patil D. Patterson G. Agrawal R. Bajwa S. Bates S. Bhatia N. Boden A. Borchers and R. Boyle. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv:1704.04760."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 2016 IEEE Computer Society Annual Symposium on VLSI. 236--241","author":"Andri R.","unstructured":"R. Andri , L. Cavigelli , D. Rossi , and L. Benini . 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights . In Proceedings of the 2016 IEEE Computer Society Annual Symposium on VLSI. 236--241 . R. Andri, L. Cavigelli, D. Rossi, and L. Benini. 2016. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights. In Proceedings of the 2016 IEEE Computer Society Annual Symposium on VLSI. 236--241."},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"D. Markovi\u0107 and R. W. Brodersen. 2012. DSP Architecture Design Essentials. Springer Science 8 Business Media.   D. Markovi\u0107 and R. W. Brodersen. 2012. DSP Architecture Design Essentials. Springer Science 8 Business Media.","DOI":"10.1007\/978-1-4419-9660-2"}],"container-title":["ACM Journal on Emerging Technologies in Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3154839","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3154839","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:11:27Z","timestamp":1750212687000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3154839"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,4,30]]},"references-count":30,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2018,4,30]]}},"alternative-id":["10.1145\/3154839"],"URL":"https:\/\/doi.org\/10.1145\/3154839","relation":{},"ISSN":["1550-4832","1550-4840"],"issn-type":[{"value":"1550-4832","type":"print"},{"value":"1550-4840","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,4,30]]},"assertion":[{"value":"2017-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-07-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}