{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T13:03:04Z","timestamp":1774962184799,"version":"3.50.1"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,5,10]],"date-time":"2022-05-10T00:00:00Z","timestamp":1652140800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program","doi-asserted-by":"crossref","award":["2018YFB2202604"],"award-info":[{"award-number":["2018YFB2202604"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"name":"University Synergy Innovation Program of Anhui Province","award":["GXXT-2019-030"],"award-info":[{"award-number":["GXXT-2019-030"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>Binarized neural networks (BNNs) and batch normalization (BN) have already become typical techniques in artificial intelligence today. Unfortunately, the massive accumulation and multiplication in BNN models bring challenges to field-programmable gate array (FPGA) implementations, because complex arithmetics in BN consume too much computing resources. To relax FPGA resource limitations and speed up the computing process, we propose a BNN accelerator architecture based on consolidation compressed tree scheme by combining both XNOR and accumulation operation of the low bit into a systematic one. During the compression process, we adopt 0-padding (not \u00b11) to achieve no-accuracy-loss from software modeling to hardware implementation. Moreover, we introduce shift-addition-BN free binarization technique to shorten the delay path and optimize on-chip storage. To sum up, we drastically cut down the hardware consumption while maintaining great speed performance with the same model complexity as the previous design. We evaluate our accelerator on MNIST and CIFAR-10 dataset and implement the whole system on the ARTIX-7 100T FPGA with speed performance of 2052.65 GOP\/s and area efficiency of 70.15 GOPS\/KLUT.<\/jats:p>","DOI":"10.1145\/3494569","type":"journal-article","created":{"date-parts":[[2022,5,10]],"date-time":"2022-05-10T11:45:32Z","timestamp":1652183132000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["A BNN Accelerator Based on Edge-skip-calculation Strategy and Consolidation Compressed Tree"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6730-3167","authenticated-orcid":false,"given":"Gaoming","family":"Du","sequence":"first","affiliation":[{"name":"Institute of VLSI Design, Hefei University of Technology, Hefei, China and IC Design Cooperative Research Center of Ministry of Education, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bangyi","family":"Chen","sequence":"additional","affiliation":[{"name":"Institute of VLSI Design, Hefei University of Technology, Hefei, China and IC Design Cooperative Research Center of Ministry of Education, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhenmin","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of VLSI Design, Hefei University of Technology, Hefei, China and IC Design Cooperative Research Center of Ministry of Education, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhenxing","family":"Tu","sequence":"additional","affiliation":[{"name":"Institute of VLSI Design, Hefei University of Technology, Hefei, China and IC Design Cooperative Research Center of Ministry of Education, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junjie","family":"Zhou","sequence":"additional","affiliation":[{"name":"Division of Automated Driving, Chery Automobile Co., Ltd., Wuhu, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shenya","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of VLSI Design, Hefei University of Technology, Hefei, China and IC Design Cooperative Research Center of Ministry of Education, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qinghao","family":"Zhao","sequence":"additional","affiliation":[{"name":"Institute of VLSI Design, Hefei University of Technology, Hefei, China and IC Design Cooperative Research Center of Ministry of Education, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongsheng","family":"Yin","sequence":"additional","affiliation":[{"name":"Institute of VLSI Design, Hefei University of Technology, Hefei, China and IC Design Cooperative Research Center of Ministry of Education, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaolei","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of VLSI Design, Hefei University of Technology, Hefei, China and IC Design Cooperative Research Center of Ministry of Education, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,5,10]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2016.111"},{"key":"e_1_3_1_3_2","article-title":"Torch7: A matlab-like environment for machine learning","volume":"5","author":"Collobert Ronan","year":"2011","unstructured":"Ronan Collobert, Koray Kavukcuoglu, and Cl\u00e9ment Farabet. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS workshop, Vol. 5. Granada, 10.","journal-title":"BigLearn, NIPS workshop"},{"key":"e_1_3_1_4_2","unstructured":"Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or  \\( -1 \\) . Retrieved from https:\/\/arXiv:1602.02830."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2014.7040963"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.23919\/EMCTokyo.2019.8893929"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE48585.2020.9116220"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289602.3293990"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.3013637"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2018.00016"},{"key":"e_1_3_1_11_2","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. Retrieved from https:\/\/arxiv.org\/abs\/1502.03167."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/UCET.2019.8881852"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/IGARSS.2018.8519373"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.09.046"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2019.8714951"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.12.038"},{"key":"e_1_3_1_19_2","first-page":"26","article-title":"Going deeper with embedded FPGA platform for convolutional neural network","author":"Qiu Jiantao","year":"2016","unstructured":"Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded FPGA platform for convolutional neural network. Proceedings of the Conference on Field-programmable Gate Arrays (FPGA\u201916). 26\u201335.","journal-title":"Proceedings of the Conference on Field-programmable Gate Arrays (FPGA\u201916)"},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Mohammad Rastegari Vicente Ordonez et\u00a0al. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. Retrieved from https:\/\/arXiv:1603.05279.","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISBI.2015.7163826"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847276"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT47387.2019.00048"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICOEI.2019.8862698"},{"key":"e_1_3_1_25_2","unstructured":"Yaman Umuroglu Nicholas J. Fraser Giulio Gambardella Michaela Blott Philip Heng Wai Leong Magnus Jahre and Kees A. Vissers. 2016. FINN: A framework for fast scalable binarized neural network inference. Retrieved from https:\/\/arXiv:1612.07119."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.23919\/ICACT48636.2020.9061291"},{"key":"e_1_3_1_27_2","first-page":"1","article-title":"Accurate and fast recovery of network monitoring data with GPU-accelerated tensor completion","author":"Xie K.","year":"2020","unstructured":"K. Xie, Y. Chen, X. Wang, G. Xie, J. Cao, J. Wen, G. Yang, and J. Sun. 2020. Accurate and fast recovery of network monitoring data with GPU-accelerated tensor completion. IEEE\/ACM Trans. Netw. (2020), 1\u201314.","journal-title":"IEEE\/ACM Trans. Netw."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICMEW.2015.7169816"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2017.95"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/APSAR46974.2019.9048264"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021741"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICFPT47387.2019.00043"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3494569","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3494569","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:16Z","timestamp":1750188676000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3494569"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,10]]},"references-count":31,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3494569"],"URL":"https:\/\/doi.org\/10.1145\/3494569","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,10]]},"assertion":[{"value":"2020-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}