{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T17:08:25Z","timestamp":1773248905282,"version":"3.50.1"},"reference-count":35,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2024,7,1]],"date-time":"2024-07-01T00:00:00Z","timestamp":1719792000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Chungnam National University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>In recent years, deep neural networks (DNNs) have addressed new applications with intelligent autonomy, often achieving higher accuracy than human experts. This capability comes at the expense of the ever-increasing complexity of emerging DNNs, causing enormous challenges while deploying on resource-limited edge devices. Improving the efficiency of DNN hardware accelerators by compression has been explored previously. Existing state-of-the-art studies applied approximate computing to enhance energy efficiency even at the expense of a little accuracy loss. In contrast, bit-serial processing has been used for improving the computational efficiency of neural processing without accuracy loss, exploiting a simple design, dynamic precision adjustment, and computation pruning. This research presents Serial\/Parallel Systolic Array (SPSA) and Octet Serial\/Parallel Systolic Array (OSPSA) processing elements for edge DNN acceleration, which exploit bit-serial processing on systolic array architecture for improving computational efficiency. For evaluation, all designs were described at the RTL level and synthesized in 28 nm technology. Post-synthesis cycle-accurate simulations of image classification over DNNs illustrated that, on average, a sample 16 \u00d7 16 systolic array indicated remarkable improvements of 17.6% and 50.6% in energy efficiency compared to the baseline, with no loss of accuracy.<\/jats:p>","DOI":"10.3390\/make6030070","type":"journal-article","created":{"date-parts":[[2024,7,3]],"date-time":"2024-07-03T09:23:59Z","timestamp":1719998639000},"page":"1484-1493","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial\/Parallel Systolic Computing"],"prefix":"10.3390","volume":"6","author":[{"given":"Iraj","family":"Moghaddasi","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Chungnam National University, Daejeon 305-764, Republic of Korea"}]},{"given":"Byeong-Gyu","family":"Nam","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Chungnam National University, Daejeon 305-764, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2024,7,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_3","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1145\/3007787.3001177","article-title":"Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks","volume":"44","author":"Chen","year":"2016","journal-title":"ACM SIGARCH Comput. Archit. News"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Villa, O., Johnson, D.R., Oconnor, M., Bolotin, E., Nellans, D., Luitjens, J., Sakharnykh, N., Wang, P., Micikevicius, P., and Scudiero, A. (2014, January 16\u201321). Scaling the power wall: A path to exascale. Proceedings of the SC\u201914: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA.","DOI":"10.1109\/SC.2014.73"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Horowitz, M. (2014, January 9\u201313). Computing\u2019s energy problem (and what we can do about it). Proceedings of the 2014 IEEE International Solid-state Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, USA.","DOI":"10.1109\/ISSCC.2014.6757323"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1016\/j.eng.2020.01.007","article-title":"A Survey of Accelerator Architectures for Deep Neural Networks","volume":"6","author":"Chen","year":"2020","journal-title":"Engineering"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., and Borchers, A. (2017, January 24\u201328). In-datacenter performance analysis of a tensor processing unit. Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, ON, Canada.","DOI":"10.1145\/3079856.3080246"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Park, J.S., Jang, J.W., Lee, H., Lee, D., Lee, S., Jung, H., Lee, S., Kwon, S., Jeong, K., and Song, J.H. (2021, January 13\u201322). 9.5 A 6K-MAC feature-map-sparsity-aware neural processing unit in 5nm flagship mobile SoC. Proceedings of the 2021 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA.","DOI":"10.1109\/ISSCC42613.2021.9365928"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2687","DOI":"10.1109\/TC.2022.3141054","article-title":"Thermal-aware design for approximate dnn accelerators","volume":"71","author":"Zervakis","year":"2022","journal-title":"IEEE Trans. Comput."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"89803","DOI":"10.1109\/ACCESS.2023.3300376","article-title":"Dependable DNN Accelerator for Safety-critical Systems: A Review on the Aging Perspective","volume":"11","author":"Moghaddasi","year":"2023","journal-title":"IEEE Access"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1537","DOI":"10.1109\/TC.2021.3092205","article-title":"ComPreEND: Computation pruning through predictive early negative detection for ReLU in a deep neural network accelerator","volume":"71","author":"Kim","year":"2021","journal-title":"IEEE Trans. Comput."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Judd, P., Albericio, J., Hetherington, T., Aamodt, T.M., and Moshovos, A. (2016, January 15\u201319). Stripes: Bit-serial deep neural network computing. Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan.","DOI":"10.1109\/MICRO.2016.7783722"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1109\/JSSC.2018.2865489","article-title":"UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision","volume":"54","author":"Lee","year":"2018","journal-title":"IEEE J. Solid-State Circuits"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1109\/JSSC.2022.3214064","article-title":"Diana: An end-to-end hybrid digital and analog neural network soc for the edge","volume":"58","author":"Houshmand","year":"2022","journal-title":"IEEE J. Solid-State Circuits"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Eckert, C., Wang, X., Wang, J., Subramaniyan, A., Iyer, R., Sylvester, D., Blaaauw, D., and Das, R. (2018, January 1\u20136). Neural cache: Bit-serial in-cache acceleration of deep neural networks. Proceedings of the 2018 ACM\/IEEE 45Th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, USA.","DOI":"10.1109\/ISCA.2018.00040"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, X., Yu, J., Augustine, C., Iyer, R., and Das, R. (2019, January 16\u201320). Bit prudent in-cache acceleration of deep convolutional neural networks. Proceedings of the 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), Washington, DC, USA.","DOI":"10.1109\/HPCA.2019.00029"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1109\/MC.1982.1653825","article-title":"Why systolic architectures?","volume":"15","author":"Kung","year":"1982","journal-title":"Computer"},{"key":"ref_19","unstructured":"Wang, Y.E., Wei, G.-Y., and Brooks, D. (2019). Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv."},{"key":"ref_20","first-page":"1","article-title":"A Survey of Design and Optimization for Systolic Array-Based DNN Accelerators","volume":"56","author":"Xu","year":"2023","journal-title":"ACM Comput. Surv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Samajdar, A., Joseph, J.M., Zhu, Y., Whatmough, P., Mattina, M., and Krishna, T. (2020, January 23\u201325). A systematic methodology for characterizing scalability of dnn accelerators using scale-sim. Proceedings of the 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, MA, USA.","DOI":"10.1109\/ISPASS48437.2020.00016"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1349","DOI":"10.1109\/TCSI.2017.2757036","article-title":"An architecture to accelerate convolution in deep neural networks","volume":"65","author":"Ardakani","year":"2017","journal-title":"IEEE Trans. Circuits Syst. I Regul. Pap."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Lu, L., Guan, N., Wang, Y., Jia, L., Luo, Z., Yin, J., Cong, J., and Liang, Y. (2021, January 14\u201318). Tenet: A framework for modeling tensor dataflow based on relation-centric notation. Proceedings of the 2021 ACM\/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain.","DOI":"10.1109\/ISCA52012.2021.00062"},{"key":"ref_24","unstructured":"Chen, Y.-H. (2018). Architecture Design for Highly Flexible and Energy-Efficient Deep Neural Network Accelerators. [Doctoral Dissertation, Massachusetts Institute of Technology]."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3007787.3001138","article-title":"Cnvlutin: Ineffectual-neuron-free deep neural network computing","volume":"44","author":"Albericio","year":"2016","journal-title":"ACM SIGARCH Comput. Archit. News"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"4537","DOI":"10.1007\/s11831-021-09530-9","article-title":"Optimizing Neural Networks for Efficient FPGA Implementation: A Survey","volume":"28","author":"Ayachi","year":"2021","journal-title":"Arch. Comput. Methods Eng."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lu, H., Chang, L., Li, C., Zhu, Z., Lu, S., Liu, Y., and Zhang, M. (2021, January 18\u201322). Distilling bit-level sparsity parallelism for general purpose deep learning acceleration. Proceedings of the MICRO-54: 54th Annual IEEE\/ACM International Symposium on Microarchitecture, Virtual.","DOI":"10.1145\/3466752.3480123"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1494","DOI":"10.1109\/TCSI.2021.3138092","article-title":"Tsunami: Triple sparsity-aware ultra energy-efficient neural network training accelerator with multi-modal iterative pruning","volume":"69","author":"Kim","year":"2022","journal-title":"IEEE Trans. Circuits Syst. I Regul. Pap."},{"key":"ref_29","first-page":"1708","article-title":"Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial","volume":"71","author":"Mao","year":"2023","journal-title":"IEEE Trans. Circuits Syst. II Express Briefs"},{"key":"ref_30","first-page":"2860","article-title":"Heterogeneous systolic array architecture for compact cnns hardware accelerators","volume":"33","author":"Xu","year":"2021","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1109\/TETC.2022.3178730","article-title":"Targeting dnn inference via efficient utilization of heterogeneous precision dnn accelerators","volume":"11","author":"Spantidi","year":"2022","journal-title":"IEEE Trans. Emerg. Top. Comput."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Dai, L., Cheng, Q., Wang, Y., Huang, G., Zhou, J., Li, K., Mao, W., and Yu, H. (2022, January 17\u201320). An energy-efficient bit-split-and-combination systolic accelerator for nas-based multi-precision convolution neural networks. Proceedings of the 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), Taipei, Taiwan.","DOI":"10.1109\/ASP-DAC52403.2022.9712509"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Sharma, H., Park, J., Suda, N., Lai, L., Chau, B., Kim, J.K., Chandra, V., and Esmaeilzadeh, H. (2018, January 1\u20136). Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. Proceedings of the 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, USA.","DOI":"10.1109\/ISCA.2018.00069"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Sharify, S., Lascorz, A.D., Siu, K., Judd, P., and Moshovos, A. (2018, January 24\u201328). Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA.","DOI":"10.1145\/3195970.3196072"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2045","DOI":"10.1007\/s00034-021-01873-9","article-title":"Bitmac: Bit-serial computation-based efficient multiply-accumulate unit for dnn accelerator","volume":"41","author":"Chhajed","year":"2022","journal-title":"Circuits Syst. Signal Process."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/3\/70\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:08:42Z","timestamp":1760108922000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/3\/70"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,1]]},"references-count":35,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["make6030070"],"URL":"https:\/\/doi.org\/10.3390\/make6030070","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,1]]}}}