{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:24:23Z","timestamp":1760235863493,"version":"build-2065373602"},"reference-count":27,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2021,9,28]],"date-time":"2021-09-28T00:00:00Z","timestamp":1632787200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"The Laboratory Open Fund of Beijing Smart-chip Microelectronics Technology Co., Ltd","award":["SGTYHT\/20-JS-221"],"award-info":[{"award-number":["SGTYHT\/20-JS-221"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Extracting features from sensing data on edge devices is a challenging application for which deep neural networks (DNN) have shown promising results. Unfortunately, the general micro-controller-class processors which are widely used in sensing system fail to achieve real-time inference. Accelerating the compute-intensive DNN inference is, therefore, of utmost importance. As the physical limitation of sensing devices, the design of processor needs to meet the balanced performance metrics, including low power consumption, low latency, and flexible configuration. In this paper, we proposed a lightweight pipeline integrated deep learning architecture, which is compatible with open-source RISC-V instructions. The dataflow of DNN is organized by the very long instruction word (VLIW) pipeline. It combines with the proposed special intelligent enhanced instructions and the single instruction multiple data (SIMD) parallel processing unit. Experimental results show that total power consumption is about 411 mw and the power efficiency is about 320.7 GOPS\/W.<\/jats:p>","DOI":"10.3390\/s21196491","type":"journal-article","created":{"date-parts":[[2021,9,28]],"date-time":"2021-09-28T21:39:29Z","timestamp":1632865169000},"page":"6491","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["A Heterogeneous RISC-V Processor for Efficient DNN Application in Smart Sensing System"],"prefix":"10.3390","volume":"21","author":[{"given":"Haifeng","family":"Zhang","sequence":"first","affiliation":[{"name":"National & Local Joint Engineering Research Center for Reliability Technology of Energy Internet Intelligent Terminal Core Chip, Beijing Smart-Chip Microelectronics Technology Co., Ltd., Beijing 100192, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaoti","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Cybersecurity, Northwestern Polytechnical University, Xi\u2019an 710072, China"},{"name":"Engineering and Research Center of Embedded Systems Integration (Ministry of Education), Xi\u2019an 710129, China"},{"name":"National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi\u2019an 710129, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuyu","family":"Du","sequence":"additional","affiliation":[{"name":"Engineering and Research Center of Embedded Systems Integration (Ministry of Education), Xi\u2019an 710129, China"},{"name":"School of Computer Science, Northwestern Polytechnical University, Xi\u2019an 710129, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongqing","family":"Guo","sequence":"additional","affiliation":[{"name":"Engineering and Research Center of Embedded Systems Integration (Ministry of Education), Xi\u2019an 710129, China"},{"name":"School of Software, Northwestern Polytechnical University, Xi\u2019an 710129, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chuxi","family":"Li","sequence":"additional","affiliation":[{"name":"Engineering and Research Center of Embedded Systems Integration (Ministry of Education), Xi\u2019an 710129, China"},{"name":"National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi\u2019an 710129, China"},{"name":"School of Computer Science, Northwestern Polytechnical University, Xi\u2019an 710129, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yidong","family":"Yuan","sequence":"additional","affiliation":[{"name":"National & Local Joint Engineering Research Center for Reliability Technology of Energy Internet Intelligent Terminal Core Chip, Beijing Smart-Chip Microelectronics Technology Co., Ltd., Beijing 100192, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0637-249X","authenticated-orcid":false,"given":"Meng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Engineering and Research Center of Embedded Systems Integration (Ministry of Education), Xi\u2019an 710129, China"},{"name":"National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi\u2019an 710129, China"},{"name":"School of Computer Science, Northwestern Polytechnical University, Xi\u2019an 710129, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shengbing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Engineering and Research Center of Embedded Systems Integration (Ministry of Education), Xi\u2019an 710129, China"},{"name":"National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi\u2019an 710129, China"},{"name":"School of Computer Science, Northwestern Polytechnical University, Xi\u2019an 710129, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"102094","DOI":"10.1016\/j.sysarc.2021.102094","article-title":"Balancing memory-accessing and computing over sparse DNN accelerator via efficient data packaging","volume":"117","author":"Wang","year":"2021","journal-title":"J. Syst. Archit."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Chen, Y., Krishna, T., Emer, J., and Sze, V. (February, January 31). 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. Proceedings of the 2016 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA.","DOI":"10.1109\/ISSCC.2016.7418007"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., and Sun, N. (2014, January 13\u201317). DaDianNao: A Machine-Learning Supercomputer. Proceedings of the 2014 47th Annual IEEE\/ACM International Symposium on Microarchitecture, Cambridge, UK.","DOI":"10.1109\/MICRO.2014.58"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1145\/2786763.2694358","article-title":"PuDianNao: A Polyvalent Machine Learning Accelerator","volume":"43","author":"Liu","year":"2015","journal-title":"SIGARCH Comput. Archit. News"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., and Temam, O. (2015, January 13\u201317). ShiDianNao: Shifting vision processing closer to the sensor. Proceedings of the 2015 ACM\/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), Portland, OR, USA.","DOI":"10.1145\/2749469.2750389"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, S., Du, Z., Tao, J., Han, D., Luo, T., Xie, Y., Chen, Y., and Chen, T. (2016, January 18\u201322). Cambricon: An instruction set architecture for neural networks. Proceedings of the 43rd International Symposium on Computer Architecture (ISCA\u201916), Seoul, Korea.","DOI":"10.1109\/ISCA.2016.42"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, S., Du, Z., Zhang, L., Lan, H., Liu, S., Li, L., Guo, Q., Chen, T., and Chen, Y. (2016, January 15\u201319). Cambricon-x: An accelerator for sparse neural networks. Proceedings of the 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-49), Taipei, Taiwan.","DOI":"10.1109\/MICRO.2016.7783723"},{"key":"ref_8","unstructured":"Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J.S., Keckler, S.W., and Dally, W.J. (2017, January 24\u201328). SCNN: An accelerator for compressed-sparse convolutional neural networks. Proceedings of the 2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sun, F., Wang, C., Gong, L., Xu, C., Zhang, Y., Lu, Y., Li, X., and Zhou, X. (2017, January 12\u201315). A High-Performance Accelerator for Large-Scale Convolutional Neural Networks. Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA\/IUCC), Guangzhou, China.","DOI":"10.1109\/ISPA\/IUCC.2017.00099"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Song, Z., Fu, B., Wu, F., Jiang, Z., Jiang, L., Jing, N., and Liang, X. (June, January 30). DRQ: Dynamic Region-based Quantization for Deep Neural Network Acceleration. Proceedings of the 2020 ACM\/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), Valencia, Spain.","DOI":"10.1109\/ISCA45697.2020.00086"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ottavi, G., Garofalo, A., Tagliavini, G., Conti, F., Benini, L., and Rossi, D. (2020, January 6\u20138). A Mixed-Precision RISC-V Processor for Extreme-Edge DNN Inference. Proceedings of the 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Limassol, Cyprus.","DOI":"10.1109\/ISVLSI49217.2020.000-5"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Bruschi, N., Garofalo, A., Conti, F., Tagliavini, G., and Rossi, D. (2020, January 15\u201317). Enabling mixed-precision quantized neural networks in extreme-edge devices. Proceedings of the 17th ACM International Conference on Computing Frontiers (CF\u201920), Siena, Italy.","DOI":"10.1145\/3387902.3394038"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhao, B., Li, J., Pan, H., and Wang, M. (2018, January 28\u201330). A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks. Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing (ICMSSP\u201918), Shenzhen China.","DOI":"10.1145\/3220162.3220178"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kong, A., and Zhao, B. (March, January 28). A High Efficient Architecture for Convolution Neural Network Accelerator. Proceedings of the 2019 2nd International Conference on Intelligent Autonomous Systems (ICoIAS), Singapore.","DOI":"10.1109\/ICoIAS.2019.00029"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Xie, W., Zhang, C., Zhang, Y., Hu, C., Jiang, H., and Wang, Z. (2018, January 6\u20138). An Energy-Efficient FPGA-Based Embedded System for CNN Application. Proceedings of the 2018 IEEE International Conference on Electron Devices and Solid State Circuits (EDSSC), Shenzhen, China.","DOI":"10.1109\/EDSSC.2018.8487057"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Choi, J., Srinivasa, S., Tanabe, Y., Sampson, J., and Narayanan, V. (2018, January 8\u201311). A Power-Efficient Hybrid Architecture Design for Image Recognition Using CNNs. Proceedings of the 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Hong Kong, China.","DOI":"10.1109\/ISVLSI.2018.00015"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"135223","DOI":"10.1109\/ACCESS.2020.3011265","article-title":"McDRAM v2: In-Dynamic Random Access Memory Systolic Array Accelerator to Address the Large Model Problem in Deep Neural Networks on the Edge","volume":"8","author":"Cho","year":"2020","journal-title":"IEEE Access"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Ko, J.H., Long, Y., Amir, M.F., Kim, D., Kung, J., Na, T., Trivedi, A.R., and Mukhopadhyay, S. (2017, January 6\u20139). Energy-efficient neural image processing for Internet-of-Things edge devices. Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA.","DOI":"10.1109\/MWSCAS.2017.8053112"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Otseidu, K., Jia, T., Bryne, J., Hargrove, L., and Gu, J. (2018, January 5\u20138). Design and optimization of edge computing distributed neural processor for biomedical rehabilitation with sensor fusion. Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201918), San Diego, CA, USA.","DOI":"10.1145\/3240765.3240794"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Loh, J., Wen, J., and Gemmeke, T. (2020, January 7\u20138). Low-Cost DNN Hardware Accelerator for Wearable, High-Quality Cardiac Arrythmia Detection. Proceedings of the 2020 IEEE 31st International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Virtual.","DOI":"10.1109\/ASAP49362.2020.00042"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1145\/3283452","article-title":"Instruction Driven Cross-layer CNN Accelerator for Fast Detection on FPGA","volume":"11","author":"Yu","year":"2018","journal-title":"ACM Trans. Reconfigurable Technol. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Li, C., Fan, X., Zhang, S., Yang, Z., Wang, M., Wang, D., and Zhang, M. (2021, January 18\u201321). Hardware-Aware NAS Framework with Layer Adaptive Scheduling on Embedded System. Proceedings of the 26th Asia and South Pacific Design Automation Conference (ASPDAC\u201921), Tokyo, Japan.","DOI":"10.1145\/3394885.3431536"},{"key":"ref_23","unstructured":"Han, S., Mao, H., and Dally, W.J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/MM.2017.39","article-title":"Software-hardware codesign for efficient neural network acceleration","volume":"37","author":"Guo","year":"2017","journal-title":"IEEE Micro"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1016\/j.neucom.2019.11.005","article-title":"ENAS oriented layer adaptive data scheduling strategy for resource limited hardware","volume":"381","author":"Li","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2700","DOI":"10.1109\/TVLSI.2017.2654506","article-title":"Near-Threshold RISC-V Core with DSP Extensions for Scalable IoT Endpoint Devices","volume":"25","author":"Gautschi","year":"2017","journal-title":"IEEE Trans. Very Large Scale Integr. VLSI Syst."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1109\/TC.2019.2941875","article-title":"Fast and Efficient Convolutional Accelerator for Edge Computing","volume":"69","author":"Ardakani","year":"2020","journal-title":"IEEE Trans. Comput."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/19\/6491\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:06:40Z","timestamp":1760166400000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/19\/6491"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,28]]},"references-count":27,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["s21196491"],"URL":"https:\/\/doi.org\/10.3390\/s21196491","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,9,28]]}}}