{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T16:58:06Z","timestamp":1778605086024,"version":"3.51.4"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2022,9,30]],"date-time":"2022-09-30T00:00:00Z","timestamp":1664496000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001459","name":"Ministry of Education, Singapore","doi-asserted-by":"crossref","award":["MOE2019-T2-1-071, and MOE2019-T1-001-072"],"award-info":[{"award-number":["MOE2019-T2-1-071, and MOE2019-T1-001-072"]}],"id":[{"id":"10.13039\/501100001459","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001475","name":"Nanyang Technological University, Singapore","doi-asserted-by":"crossref","award":["NAP (M4082282), and SUG (M4082087)"],"award-info":[{"award-number":["NAP (M4082282), and SUG (M4082087)"]}],"id":[{"id":"10.13039\/501100001475","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>Ternary Neural Networks (TNNs) and mixed-precision Ternary Binary Networks (TBNs) have demonstrated higher accuracy compared to Binary Neural Networks (BNNs) while providing fast, low-power, and memory-efficient inference. Related works have improved the accuracy of TNNs and TBNs, but overlooked their optimizations on CPU and GPU platforms. First, there is no unified encoding for the binary and ternary values in TNNs and TBNs. Second, existing works store the 2-bit quantized data sequentially in 32\/64-bit integers, resulting in bit-extraction overhead. Last, adopting standard 2-bit multiplications for ternary values leads to a complex computation pipeline, and efficient mixed-precision multiplication between ternary and binary values is unavailable.<\/jats:p>\n          <jats:p>\n            In this article, we propose TAB as a unified and optimized inference method for ternary, binary, and mixed-precision neural networks. TAB includes unified value representation, efficient data storage scheme and novel bitwise dot product pipelines on CPU\/GPU platforms. We adopt signed integers for consistent value representation across binary and ternary values. We introduce a bitwidth-last data format that stores the first and second bits of the ternary values separately to remove the bit extraction overhead. We design the ternary and binary bitwise dot product pipelines based on Gated-XOR using up to 40% fewer operations than\n            <jats:bold>State-Of-The-Art (SOTA)<\/jats:bold>\n            methods.\n          <\/jats:p>\n          <jats:p>\n            Theoretical speedup analysis shows that our proposed TAB-TNN is 2.3\u00d7 fast as the SOTA ternary method RTN, 9.8\u00d7 fast as 8-bit integer quantization (INT8), and 39.4\u00d7 fast as 32-bit full-precision convolution (FP32). Experiment results on CPU and GPU platforms show that our TAB-TNN has achieved up to 34.6\u00d7 speedup and 16\u00d7 storage size reduction compared with FP32 layers. TBN, Binary-activation Ternary-weight Network (BTN), and BNN in TAB are up to 40.7\u00d7, 56.2\u00d7, and 72.2\u00d7 as fast as FP32. TAB-TNN is up to 70.1% faster and 12.8% more power-efficient than RTN on Darknet-19 while keeping the same accuracy. TAB is open source as a PyTorch Extension\n            <jats:xref ref-type=\"fn\">\n              <jats:sup>1<\/jats:sup>\n            <\/jats:xref>\n            for easy integration with existing CNN models.\n          <\/jats:p>","DOI":"10.1145\/3508390","type":"journal-article","created":{"date-parts":[[2022,1,26]],"date-time":"2022-01-26T18:01:53Z","timestamp":1643220113000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["TAB: Unified and Optimized Ternary, Binary, and Mixed-precision Neural Network Inference on the Edge"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2094-7643","authenticated-orcid":false,"given":"Shien","family":"Zhu","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore"}]},{"given":"Luan H. K.","family":"Duong","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9348-4662","authenticated-orcid":false,"given":"Weichen","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2022,10,8]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-41321-1_2"},{"key":"e_1_3_2_3_2","volume-title":"Instruction tables","author":"Denmark Agner Fog, Technical University of","year":"2020","unstructured":"Agner Fog, Technical University of Denmark. 2020. Instruction tables. Retrieved from https:\/\/www.agner.org\/optimize\/instruction_tables.pdf."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2017.7966166"},{"key":"e_1_3_2_5_2","unstructured":"Apache (incubating) TVM. 2021. tvm.relay.nn.bitserial_conv2d. Retrieved from https:\/\/tvm.apache.org\/docs\/api\/python\/relay\/nn.html#tvm.relay.nn.bitserial_conv2d."},{"key":"e_1_3_2_6_2","article-title":"wav2vec 2.0: A framework for self-supervised learning of speech representations","volume":"33","author":"Baevski Alexei","year":"2020","unstructured":"Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33 (2020).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_7_2","article-title":"Training competitive binary neural networks from scratch","author":"Bethge Joseph","year":"2018","unstructured":"Joseph Bethge, Marvin Bornstein, Adrian Loy, Haojin Yang, and Christoph Meinel. 2018. Training competitive binary neural networks from scratch. ArXiv e-prints (2018). arxiv:1812.01965.","journal-title":"ArXiv e-prints"},{"key":"e_1_3_2_8_2","article-title":"FATNN: Fast and accurate ternary neural networks","author":"Chen Peng","year":"2020","unstructured":"Peng Chen, Bohan Zhuang, and Chunhua Shen. 2020. FATNN: Fast and accurate ternary neural networks. arXiv preprint arXiv:2008.05101 (2020).","journal-title":"arXiv preprint arXiv:2008.05101"},{"key":"e_1_3_2_9_2","first-page":"578","volume-title":"13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918). 578\u2013594."},{"key":"e_1_3_2_10_2","article-title":"PACT: Parameterized clipping activation for quantized neural networks","author":"Choi Jungwook","year":"2018","unstructured":"Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I.-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).","journal-title":"arXiv preprint arXiv:1805.06085"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2018.01.010"},{"key":"e_1_3_2_12_2","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).","journal-title":"arXiv preprint arXiv:1810.04805"},{"key":"e_1_3_2_13_2","article-title":"A survey of quantization methods for efficient neural network inference","author":"Gholami Amir","year":"2021","unstructured":"Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630 (2021).","journal-title":"arXiv preprint arXiv:2103.13630"},{"key":"e_1_3_2_14_2","article-title":"A survey on methods and theories of quantized neural networks","author":"Guo Yunhui","year":"2018","unstructured":"Yunhui Guo. 2018. A survey on methods and theories of quantized neural networks. arXiv preprint arXiv:1808.04752 (2018).","journal-title":"arXiv preprint arXiv:1808.04752"},{"key":"e_1_3_2_15_2","article-title":"Learning both weights and connections for efficient neural network","volume":"28","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 28 (2015).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_16_2","first-page":"244","volume-title":"IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Hu Y.","year":"2018","unstructured":"Y. Hu, J. Zhai, D. Li, Y. Gong, Y. Zhu, W. Liu, L. Su, and J. Jin. 2018. BitFlow: Exploiting vector parallelism for binary neural networks on CPU. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). 244\u2013253."},{"key":"e_1_3_2_17_2","article-title":"Binarized neural networks","volume":"29","author":"Hubara Itay","year":"2016","unstructured":"Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. Adv. Neural Inf. Process. Syst. 29 (2016).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2020.2993045"},{"key":"e_1_3_2_19_2","volume-title":"daBNN GitHub","author":"Vision JDAI Computer","year":"2020","unstructured":"JDAI Computer Vision. 2020. daBNN GitHub. Retrieved from https:\/\/github.com\/JDAI-CV\/dabnn\/blob\/master\/README_CN.md."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00448"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/292395.292412"},{"key":"e_1_3_2_22_2","article-title":"A study of BFLOAT16 for deep learning training","author":"Kalamkar Dhiraj","year":"2019","unstructured":"Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, et\u00a0al. 2019. A study of BFLOAT16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019).","journal-title":"arXiv preprint arXiv:1905.12322"},{"key":"e_1_3_2_23_2","unstructured":"Kim Walisch. 2021. libpopcnt. Retrieved from https:\/\/github.com\/kimwalisch\/libpopcnt."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i10.17036"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5912"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-01970-8_89"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.07.045"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.04.141"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00273"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_44"},{"key":"e_1_3_2_31_2","unstructured":"Dukhan Marat Wu Yiming Lu Hao and Maher Bert. 2019. QNNPACK. Retrieved from https:\/\/github.com\/pytorch\/QNNPACK."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2974843"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics10080886"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00357"},{"key":"e_1_3_2_35_2","volume-title":"How To Optimize Gemm","author":"Geijn Robert van de","year":"2018","unstructured":"Robert van de Geijn. 2018. How To Optimize Gemm. Retrieved from https:\/\/github.com\/flame\/how-to-optimize-gemm."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00232"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_3_2_38_2","unstructured":"RuiDeng Technologies. 2020. UM25C USB Tester Meter Instructions. Retrieved from https:\/\/phuketshopper.com\/software\/UM25C\/UM25C%20USB%20tester%20meter%20Instructions.pdf."},{"key":"e_1_3_2_39_2","volume-title":"Deep Learning Performance Boost by Intel VNNI","author":"Shen Shufan Wu, Feng Tian, Haihao","year":"2019","unstructured":"Shufan Wu, Feng Tian, Haihao Shen. 2019. Deep Learning Performance Boost by Intel VNNI. Retrieved from https:\/\/www.intel.com\/content\/www\/us\/en\/artificial-intelligence\/posts\/deep-learning-performance-boost-by-intel-vnni.html."},{"key":"e_1_3_2_40_2","volume-title":"8-bit Inference with TensorRT","author":"NVIDIA Szymon Migacz,","year":"2017","unstructured":"Szymon Migacz, NVIDIA. 2017. 8-bit Inference with TensorRT. Retrieved from https:\/\/on-demand.gputechconf.com\/gtc\/2017\/presentation\/s7310-8-bit-inference-with-tensorrt.pdf."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"e_1_3_2_42_2","volume-title":"NCNN GitHub","author":"Limited THL A29","year":"2020","unstructured":"THL A29 Limited. 2020. NCNN GitHub. Retrieved from https:\/\/github.com\/Tencent\/ncnn."},{"key":"e_1_3_2_43_2","article-title":"FQ-Conv: Fully quantized convolution for efficient and accurate inference","author":"Verhoef Bram-Ernst","year":"2019","unstructured":"Bram-Ernst Verhoef, Nathan Laubeuf, Stefan Cosemans, Peter Debacker, Ioannis Papistas, Arindam Mallik, and Diederik Verkest. 2019. FQ-Conv: Fully quantized convolution for efficient and accurate inference. arXiv preprint arXiv:1912.09356 (2019).","journal-title":"arXiv preprint arXiv:1912.09356"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_20"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00066"},{"key":"e_1_3_2_46_2","article-title":"Integer quantization for deep learning inference: Principles and empirical evaluation","author":"Wu Hao","year":"2020","unstructured":"Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, and Paulius Micikevicius. 2020. Integer quantization for deep learning inference: Principles and empirical evaluation. arXiv preprint arXiv:2004.09602 (2020).","journal-title":"arXiv preprint arXiv:2004.09602"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3129393"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01237-3_23"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350534"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i4.16462"},{"key":"e_1_3_2_51_2","volume-title":"International Conference on Learning Representations","author":"Zhao Xiandong","year":"2020","unstructured":"Xiandong Zhao, Ying Wang, Xuyi Cai, Cheng Liu, and Lei Zhang. 2020. Linear symmetric quantization of neural networks for low-precision integer hardware. In International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=H1lBj2VFPS."},{"key":"e_1_3_2_52_2","article-title":"DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients","author":"Zhou Shuchang","year":"2016","unstructured":"Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).","journal-title":"arXiv preprint arXiv:1606.06160"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00204"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS51040.2020.00026"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3508390","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3508390","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:36Z","timestamp":1750182576000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3508390"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,30]]},"references-count":53,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3508390"],"URL":"https:\/\/doi.org\/10.1145\/3508390","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,30]]},"assertion":[{"value":"2021-07-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-26","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-10-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}