{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T01:41:33Z","timestamp":1774662093651,"version":"3.50.1"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,4,30]],"date-time":"2024-04-30T00:00:00Z","timestamp":1714435200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62071360, 62121001, 62322117, 62371365, U22B2014"],"award-info":[{"award-number":["62071360, 62121001, 62322117, 62371365, U22B2014"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Youth Talent Promotion Project of China Association for Science and Technology","award":["2020QNRC001"],"award-info":[{"award-number":["2020QNRC001"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Innovation Fund of Xidian University","award":["YJSJ23012"],"award-info":[{"award-number":["YJSJ23012"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>\n            Binary neural network (BNN), where both the weight and the activation values are represented with one bit, provides an attractive alternative to deploy highly efficient deep learning inference on resource-constrained edge devices. However, our investigation reveals that, to achieve satisfactory accuracy gains, state-of-the-art (SOTA) BNNs, such as FracBNN and ReActNet, usually have to incorporate various auxiliary floating-point components and increase the model size, which in turn degrades the hardware performance efficiency. In this article, we aim to quantify such hardware inefficiency in SOTA BNNs and further mitigate it with negligible accuracy loss. First, we observe that the auxiliary floating-point (AFP) components consume an average of 93% DSPs, 46% LUTs, and 62% FFs, among the entire BNN accelerator resource utilization. To mitigate such overhead, we propose a novel algorithm-hardware co-design, called\n            <jats:italic>FuseBNN<\/jats:italic>\n            , to fuse those AFP operators without hurting the accuracy. On average, FuseBNN reduces AFP resource utilization to 59% DSPs, 13% LUTs, and 16% FFs. Second, SOTA BNNs often use the compact MobileNetV1 as the backbone network but have to replace the lightweight 3 \u00d7 3 depth-wise convolution (DWC) with the 3 \u00d7 3 standard convolution (SC, e.g., in ReActNet and our ReActNet-adapted BaseBNN) or even more complex fractional 3 \u00d7 3 SC (e.g., in FracBNN) to bridge the accuracy gap. As a result, the model parameter size is significantly increased and becomes 2.25\u00d7 larger than that of the 4-bit direct quantization with the original DWC (4-Bit-Net); the number of multiply-accumulate operations is also significantly increased so that the overall LUT resource usage of BaseBNN is almost the same as that of 4-Bit-Net. To address this issue, we propose\n            <jats:italic>HyBNN<\/jats:italic>\n            , where we binarize depth-wise separation convolution (DSC) blocks for the first time to decrease the model size and incorporate 4-bit DSC blocks to compensate for the accuracy loss. For the ship detection task in synthetic aperture radar imagery on the AMD-Xilinx ZCU102 FPGA, HyBNN achieves a detection accuracy of 94.8% and a detection speed of 615 frames per second (FPS), which is 6.8\u00d7 faster than FuseBNN+ (94.9% accuracy) and 2.7\u00d7 faster than 4-Bit-Net (95.9% accuracy). For image classification on the CIFAR-10 dataset on the AMD-Xilinx Ultra96-V2 FPGA, HyBNN achieves 1.5\u00d7 speedup and 0.7% better accuracy over SOTA FracBNN.\n          <\/jats:p>","DOI":"10.1145\/3631610","type":"journal-article","created":{"date-parts":[[2023,11,7]],"date-time":"2023-11-07T12:14:16Z","timestamp":1699359256000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural Networks"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6921-1007","authenticated-orcid":false,"given":"Geng","family":"Yang","sequence":"first","affiliation":[{"name":"State Key Lab of Integrated Services Networks, Xidian University, Xi\u2019an, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0851-6565","authenticated-orcid":false,"given":"Jie","family":"Lei","sequence":"additional","affiliation":[{"name":"State Key Lab of Integrated Services Networks, Xidian University, Xi\u2019an, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0603-9697","authenticated-orcid":false,"given":"Zhenman","family":"Fang","sequence":"additional","affiliation":[{"name":"School of Engineering Science, Simon Fraser University, Burnaby, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0234-6270","authenticated-orcid":false,"given":"Yunsong","family":"Li","sequence":"additional","affiliation":[{"name":"State Key Lab of Integrated Services Networks, Xidian University, Xi\u2019an, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3111-1116","authenticated-orcid":false,"given":"Jiaqing","family":"Zhang","sequence":"additional","affiliation":[{"name":"State Key Lab of Integrated Services Networks, Xidian University, Xi\u2019an, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8310-024X","authenticated-orcid":false,"given":"Weiying","family":"Xie","sequence":"additional","affiliation":[{"name":"State Key Lab of Integrated Services Networks, Xidian University, Xi\u2019an, China"}]}],"member":"320","published-online":{"date-parts":[[2024,4,30]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"AMD-Xilinx. 2021. Introduction to Vitis HLS. Retrieved July 28 2022 from https:\/\/docs.xilinx.com\/r\/2020.2-English\/ug1399-vitis-hls\/Introduction-to-Vitis-HLS"},{"key":"e_1_3_1_3_2","unstructured":"Joseph Bethge Christian Bartz Haojin Yang Ying Chen and Christoph Meinel. 2020. MeliusNet: Can binary neural networks achieve MobileNet-level accuracy? CoRR abs\/2001.05936 (2020). Retrieved from https:\/\/arxiv.org\/abs\/2001.05936"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3242897"},{"key":"e_1_3_1_5_2","unstructured":"Matthieu Courbariaux and Yoshua Bengio. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to \\(+1\\) or \\(-\\) 1. CoRR abs\/1602.02830 (2016). Retrieved from http:\/\/arxiv.org\/abs\/1602.02830"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.2976475"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2018.00018"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2018.00016"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_11_2","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs\/1704.04861 (2017). Retrieved from http:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_1_13_2","first-page":"448","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning. PMLR, 448\u2013456."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/BIGSARDATA.2017.8124934"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.09.046"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00489"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58568-6_9"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_44"},{"key":"e_1_3_1_19_2","unstructured":"Ping Luo Xinjiang Wang Wenqi Shao and Zhanglin Peng. 2018. Towards understanding regularization in batch normalization. International Conference on Learning Representations."},{"key":"e_1_3_1_20_2","unstructured":"Brais Mart\u00ednez Jing Yang Adrian Bulat and Georgios Tzimiropoulos. 2020. Training binary neural networks with real-to-binary convolutions. CoRR abs\/2003.11535 (2020). Retrieved from https:\/\/arxiv.org\/abs\/2003.11535"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPT.2016.7929192"},{"key":"e_1_3_1_22_2","first-page":"12","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K\u00f6pf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., 12 pages."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107281"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.690"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490422.3502364"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6900"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3117908"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","unstructured":"Geng Yang Jie Lei Weiying Xie Zhenman Fang Yunsong Li Jiaxuan Wang and Xin Zhang. 2022. Algorithm\/Hardware codesign for real-time on-satellite CNN-based ship detection in SAR imagery. IEEE Transactions on Geoscience and Remote Sensing 60 (2022). DOI:10.1109\/TGRS.2022.316149","DOI":"10.1109\/TGRS.2022.316149"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3431920.3439296"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01215"},{"key":"e_1_3_1_33_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Zhang Yichi","year":"2020","unstructured":"Yichi Zhang, Ritchie Zhao, Weizhe Hua, Nayun Xu, G. Edward Suh, and Zhiru Zhang. 2020. Precision gating: Improving neural network efficiency with dynamic dual-precision activations. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=SJgVU0EKwS"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021741"},{"key":"e_1_3_1_35_2","unstructured":"Xiandong Zhao Ying Wang Xuyi Cai Cheng Liu and Lei Zhang. 2020. Linear symmetric quantization of neural networks for low-precision integer hardware. International Conference on Learning Representations (2019)."},{"key":"e_1_3_1_36_2","unstructured":"Shuchang Zhou Zekun Ni Xinyu Zhou He Wen Yuxin Wu and Yuheng Zou. 2016. DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs\/1606.06160 (2016). Retrieved from http:\/\/arxiv.org\/abs\/1606.06160"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00506"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631610","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3631610","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:53Z","timestamp":1750291433000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631610"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,30]]},"references-count":36,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3631610"],"URL":"https:\/\/doi.org\/10.1145\/3631610","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,30]]},"assertion":[{"value":"2023-06-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-10-29","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}