{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T05:04:38Z","timestamp":1769922278173,"version":"3.49.0"},"reference-count":35,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2022,2,6]],"date-time":"2022-02-06T00:00:00Z","timestamp":1644105600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003621","name":"Ministry of Science ICT and Future Planning","doi-asserted-by":"publisher","award":["IITP-2020-0-01462"],"award-info":[{"award-number":["IITP-2020-0-01462"]}],"id":[{"id":"10.13039\/501100003621","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The convergence of artificial intelligence (AI) is one of the critical technologies in the recent fourth industrial revolution. The AIoT (Artificial Intelligence Internet of Things) is expected to be a solution that aids rapid and secure data processing. While the success of AIoT demanded low-power neural network processors, most of the recent research has been focused on accelerator designs only for inference. The growing interest in self-supervised and semi-supervised learning now calls for processors offloading the training process in addition to the inference process. Incorporating training with high accuracy goals requires the use of floating-point operators. The higher precision floating-point arithmetic architectures in neural networks tend to consume a large area and energy. Consequently, an energy-efficient\/compact accelerator is required. The proposed architecture incorporates training in 32 bits, 24 bits, 16 bits, and mixed precisions to find the optimal floating-point format for low power and smaller-sized edge device. The proposed accelerator engines have been verified on FPGA for both inference and training of the MNIST image dataset. The combination of 24-bit custom FP format with 16-bit Brain FP has achieved an accuracy of more than 93%. ASIC implementation of this optimized mixed-precision accelerator using TSMC 65nm reveals an active area of 1.036 \u00d7 1.036 mm2 and energy consumption of 4.445 \u00b5J per training of one image. Compared with 32-bit architecture, the size and the energy are reduced by 4.7 and 3.91 times, respectively. Therefore, the CNN structure using floating-point numbers with an optimized data path will significantly contribute to developing the AIoT field that requires a small area, low energy, and high accuracy.<\/jats:p>","DOI":"10.3390\/s22031230","type":"journal-article","created":{"date-parts":[[2022,2,6]],"date-time":"2022-02-06T20:40:18Z","timestamp":1644180018000},"page":"1230","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Optimal Architecture of Floating-Point Arithmetic for Neural Network Training Processors"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0500-904X","authenticated-orcid":false,"given":"Muhammad","family":"Junaid","sequence":"first","affiliation":[{"name":"Department of Electronics, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4038-462X","authenticated-orcid":false,"given":"Saad","family":"Arslan","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, COMSATS University Islamabad, Park Road, Tarlai Kalan, Islamabad 45550, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"TaeGeon","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Electronics, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2602-2075","authenticated-orcid":false,"given":"HyungWon","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Electronics, College of Electrical and Computer Engineering, Chungbuk National University, Cheongju 28644, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,2,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Liu, Z., Liu, Z., Ren, E., Luo, L., Wei, Q., Wu, X., Li, X., Qiao, F., and Liu, X.J. (2019, January 15\u201317). A 1.8mW Perception Chip with Near-Sensor Processing Scheme for Low-Power AIoT Applications. Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Miami, FL, USA.","DOI":"10.1109\/ISVLSI.2019.00087"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"82721","DOI":"10.1109\/ACCESS.2019.2924045","article-title":"A Survey on IoT Security: Application Areas, Security Threats, and Solution Architectures","volume":"7","author":"Hassija","year":"2019","journal-title":"IEEE Access"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"105414","DOI":"10.1016\/j.nanoen.2020.105414","article-title":"Technology evolution from self-powered sensors to AIoT enabled smart homes","volume":"79","author":"Dong","year":"2020","journal-title":"Nano Energy"},{"key":"ref_4","first-page":"1534","article-title":"A ReRAM-Based Computing-in-Memory Convolutional-Macro With Customized 2T2R Bit-Cell for AIoT Chip IP Applications","volume":"67","author":"Tan","year":"2020","journal-title":"IEEE Trans. Circuits Syst. II: Express Briefs"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wang, Z., Le, Y., Liu, Y., Zhou, P., Tan, Z., Fan, H., Zhang, Y., Ru, J., Wang, Y., and Huang, R. (2021, January 13\u201322). 12.1 A 148nW General-Purpose Event-Driven Intelligent Wake-Up Chip for AIoT Devices Using Asynchronous Spike-Based Feature Extractor and Convolutional Neural Network. Proceedings of the 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA.","DOI":"10.1109\/ISSCC42613.2021.9365816"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/JIOT.2021.3095077","article-title":"A Survey on Federated Learning for Resource-Constrained IoT Devices","volume":"9","author":"Imteaj","year":"2021","journal-title":"IEEE Internet Things J."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lane, N.D., Bhattacharya, S., Georgiev, P., Forlivesi, C., Jiao, L., Qendro, L., and Kawsar, F. (2016, January 11\u201314). DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices. Proceedings of the 2016 15th ACM\/IEEE International Conference on Information Processing in Sensor Networks (IPSN), Vienna, Austria.","DOI":"10.1109\/IPSN.2016.7460664"},{"key":"ref_8","unstructured":"Venkataramanaiah, S.K., Ma, Y., Yin, S., Nurvithadhi, E., Dasu, A., Cao, Y., and Seo, J.-S. (2019, January 8\u201312). Automatic Compiler Based FPGA Accelerator for CNN Training. Proceedings of the 2019 29th International Conference on Field Programmable Logic and Applications (FPL), Barcelona, Spain."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lu, J., Lin, J., and Wang, Z. (2020, January 20\u201322). A Reconfigurable DNN Training Accelerator on FPGA. Proceedings of the 2020 IEEE Workshop on Signal Processing Systems (SiPS), Coimbra, Portugal.","DOI":"10.1109\/SiPS50750.2020.9195234"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N.R., Ganger, G.R., Gibbons, P.B., and Zaharia, M. (2019, January 27\u201330). PipeDream: Generalized Pipeline Parallelism for DNN Training. Proceedings of the 27th ACM Symposium on Operating Systems Principles, Huntsville, ON, Canada.","DOI":"10.1145\/3341301.3359646"},{"key":"ref_11","unstructured":"Jeremy, F.O., Kalin, P., Michael, M., Todd, L., Ming, L., Danial, A., Shlomi, H., Michael, A., Logan, G., and Mahdi, H. (2018, January 1\u20136). A Configurable Cloud-Scale DNN Processor for Real-Time AI. Proceedings of the 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Asghar, M.S., Arslan, S., and Kim, H. (2021). A Low-Power Spiking Neural Network Chip Based on a Compact LIF Neuron and Binary Exponential Charge Injector Synapse Circuits. Sensors, 21.","DOI":"10.3390\/s21134462"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"99","DOI":"10.3389\/fncom.2015.00099","article-title":"Unsupervised learning of digit recognition using spike-timing-dependent plasticity","volume":"9","author":"Diehl","year":"2015","journal-title":"Front. Comput. Neurosci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2814","DOI":"10.1021\/acsnano.6b07894","article-title":"Pattern recognition using carbon nanotube synaptic transistors with an adjustable weight update protocol","volume":"11","author":"Kim","year":"2017","journal-title":"ACS Nano"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"4782","DOI":"10.1109\/TNNLS.2017.2778940","article-title":"High-performance mixed-signal neurocom- puting with nanoscale floating-gate memory cell arrays","volume":"29","author":"Guo","year":"2018","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1220","DOI":"10.1109\/LED.2017.2731859","article-title":"Linking conductive filament properties and evolution to synaptic behavior of RRAM devices for neuromorphic applications","volume":"38","author":"Woo","year":"2017","journal-title":"IEEE Electron. Device Lett."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"81370","DOI":"10.1109\/ACCESS.2019.2923822","article-title":"ADAS Acceptability Improvement Based on Self-Learning of Individual Driving Characteristics: A Case Study of Lane Change Warning System","volume":"7","author":"Sun","year":"2019","journal-title":"IEEE Access"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Park, D., Kim, S., An, Y., and Jung, J.-Y. (2018). LiReD: A Light-Weight Real-Time Fault Detection System for Edge Computing Using LSTM Recurrent Neural Networks. Sensors, 18.","DOI":"10.3390\/s18072110"},{"key":"ref_19","unstructured":"Kumar, A., Goyal, S., and Varma, M. (2017, January 6\u201311). Resource-efficient machine learning in 2 KB RAM for the Internet of Things. Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1109\/JETCAS.2018.2842761","article-title":"Integer Convolutional Neural Network for Seizure Detection","volume":"8","author":"Truong","year":"2018","journal-title":"IEEE J. Emerg. Sel. Top. Circuits Syst."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1109\/TVLSI.2019.2935251","article-title":"An Energy-Efficient Deep Convolutional Neural Network Inference Processor With Enhanced Output Stationary Dataflow in 65-Nm CMOS","volume":"28","author":"Sim","year":"2020","journal-title":"IEEE Trans. VLSI Syst."},{"key":"ref_22","unstructured":"Das, D., Mellempudi, N., Mudigere, D., Kalamkar, D., Avancha, S., Banerjee, K., Sridharan, S., Vaidyanathan, K., Kaul, B., and Georganas, E. (2018). Mixed precision training of convolutional neural networks using integer operations. arXiv."},{"key":"ref_23","unstructured":"Gupta, S., Agrawal, A., Gopalakrishnan, K., and Narayanan, P. (2015, January 6\u201311). Deep learning with limited numerical precision. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Fleischer, B., Shukla, S., Ziegler, M., Silberman, J., Oh, J., Srinivasan, V., Choi, J., Mueller, S., Agrawal, A., and Babinsky, T. (2018, January 18\u201322). A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference. Proceedings of the 2018 IEEE Symposium on VLSI Circuits, Honolulu, HI, USA.","DOI":"10.1109\/VLSIC.2018.8502276"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1874","DOI":"10.1109\/TVLSI.2019.2913958","article-title":"High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic","volume":"27","author":"Lian","year":"2019","journal-title":"IEEE Trans. Very Large Scale Integr. (VLSI) Syst."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Iwata, A., Yoshida, Y., Matsuda, S., Sato, Y., and Suzumura, N. (1989, January 18\u201322). An artificial neural network accelerator using general purpose 24 bit floating point digital signal processors. Proceedings of the International 1989 Joint Conference on Neural Networks, Washington, DC, USA.","DOI":"10.1109\/IJCNN.1989.118695"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, X., Liu, S., Zhang, R., Liu, C., Huang, D., Zhou, S., Guo, J., Guo, Q., Du, Z., and Zhi, T. (2020, January 13\u201319). Fixed-Point Back-Propagation Training. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00240"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Mujawar, S., Kiran, D., and Ramasangu, H. (2018, January 9\u201310). An Efficient CNN Architecture for Image Classification on FPGA Accelerator. Proceedings of the 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC), Bengaluru, India.","DOI":"10.1109\/ICAECC.2018.8479517"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chen, C.-Y., Choi, J., Gopalakrishnan, K., Srinivasan, V., and Venkataramani, S. (2018, January 19\u201323). Exploiting approximate computing for deep learning acceleration. Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany.","DOI":"10.23919\/DATE.2018.8342119"},{"key":"ref_30","unstructured":"Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., Garcia, D., Ginsburg, B., Houston, M., Kuchaev, O., and Venkatesh, G. (2017). Mixed precision training. arXiv."},{"key":"ref_31","unstructured":"Christopher, B.M. (2006). Pattern Recognition and Machine Learning, Springer."},{"key":"ref_32","unstructured":"(2019). IEEE Standard for Floating-Point Arithmetic (Standard No. IEEE Std 754-2019 (Revision of IEEE 754-2008))."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hong, J., Arslan, S., Lee, T., and Kim, H. (2021). Design of Power-Efficient Training Accelerator for Convolution Neural Networks. Electronics, 10.","DOI":"10.3390\/electronics10070787"},{"key":"ref_34","unstructured":"Zhao, W., Fu, H., Luk, W., Yu, T., Wang, S., Feng, B., Ma, Y., and Yang, G. (2016, January 6\u20138). F-CNN: An FPGA-Based Framework for Training Convolutional Neural Networks. Proceedings of the 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), London, UK."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2621","DOI":"10.1109\/TVLSI.2013.2294916","article-title":"Minitaur, an Event-Driven FPGA-Based Spiking Network Accelerator","volume":"22","author":"Neil","year":"2014","journal-title":"IEEE Trans. Very Large-Scale Integr. (VLSI) Syst."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/3\/1230\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:14:53Z","timestamp":1760134493000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/3\/1230"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,6]]},"references-count":35,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["s22031230"],"URL":"https:\/\/doi.org\/10.3390\/s22031230","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,6]]}}}