{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,1]],"date-time":"2026-03-01T09:16:23Z","timestamp":1772356583005,"version":"3.50.1"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,1,6]],"date-time":"2025-01-06T00:00:00Z","timestamp":1736121600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science and Technology Council in Taiwan","award":["111-2221-E-A49-131-MY3"],"award-info":[{"award-number":["111-2221-E-A49-131-MY3"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>The rapid growth of on-device artificial intelligence increases the importance of TinyML inference applications. However, the stringent tiny memory space on the microcontroller unit (MCU) raises the grand challenge when deploying deep neural network (DNN) models on such a resource-constrained embedded system device. Traditionally, the machine learning system platform executes operators in a layer-wise manner. The layer-wise inference continues to the next operator before completing an operator. Thus, the DNN model compiler needs to allocate the SRAM memory space to store an operator\u2019s entire input and output tensor when using the layer-wise inference on an MCU. However, the layer-wise inference will run out of memory quickly when an operator\u2019s input and output tensor size in a DNN model is large. Consequently, the patch-based inference work divides a tensor into multiple small patches and only stores a small one to reduce the peak SRAM memory usage on an MCU. However, the computation of the overlapping patches tremendously increases the computational overhead of the patch-based inference and makes the patch-based inference undesirable on an MCU. Thus, this work presents StreamNet, a TinyML model compilation framework. StreamNet employs the stream buffer to eliminate redundant computation of patch-based inference while using small SRAM memory space on an MCU. StreamNet typically uses one type of patch configuration in a DNN model and does not completely eliminate the memory bottleneck of TinyML models. Unlike StreamNet, this article designs StreamNet++ patch-based variant inference that uses several types of patch configurations to completely remove the additional memory bottleneck even using StreamNet. Furthermore, StreamNet++ designs a parameter selection algorithm that quickly yields the best patch parameter candidates to meet the memory constraint of different MCUs. As a result, in 10 TinyML models, StreamNet++2D stream processing achieves a geometric mean of 5.7X speedup and removes 78% of redundant MACs over the latest patch-based inference.<\/jats:p>","DOI":"10.1145\/3706107","type":"journal-article","created":{"date-parts":[[2024,11,29]],"date-time":"2024-11-29T09:29:47Z","timestamp":1732872587000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["StreamNet++: Memory-Efficient Streaming TinyML Model Compilation on Microcontrollers"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-4456-9667","authenticated-orcid":false,"given":"Chen-Fong","family":"Hsu","sequence":"first","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, United States"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6408-9996","authenticated-orcid":false,"given":"Hong-Sheng","family":"Zheng","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-6796-8761","authenticated-orcid":false,"given":"Yu-Yuan","family":"Liu","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2401-9916","authenticated-orcid":false,"given":"Tsung Tai","family":"Yeh","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]}],"member":"320","published-online":{"date-parts":[[2025,1,6]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"2020b. TensorFlow Lite Guide portable data schema. Retrieved April 5 2024 from https:\/\/www.tensorflow.org\/lite\/guide"},{"key":"e_1_3_1_3_2","unstructured":"2023. microTVM: TVM on bare-metal. Retrieved April 5 2024 from https:\/\/tvm.apache.org\/docs\/topic\/microtvm\/index.html"},{"key":"e_1_3_1_4_2","first-page":"1","volume-title":"Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)","author":"Alwani Manoj","year":"2016","unstructured":"Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In Proceedings of the 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1\u201312."},{"key":"e_1_3_1_5_2","article-title":"Arm Mbed CLI","author":"Ltd. ARM","year":"2023","unstructured":"ARM Ltd.2023. Arm Mbed CLI. [Online]. Retrieved May 16, 2023 from https:\/\/github.com\/ARMmbed\/mbed-cli","journal-title":"[Online]. Retrieved May 16, 2023 from https:\/\/github.com\/ARMmbed\/mbed-cli"},{"key":"e_1_3_1_6_2","unstructured":"H. Cai L. Zhu and S. Han. 2018. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2020.2983648"},{"key":"e_1_3_1_8_2","unstructured":"T. Chen T. Moreau Z. Jiang L. Zheng E. Yan H. Shen and A. Krishnamurthy. 2018. TVM: An automated End-to-End optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918). 578\u2013594."},{"key":"e_1_3_1_9_2","first-page":"5904","volume-title":"Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Chen Xie","year":"2021","unstructured":"Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, and Jinyu Li. 2021. Developing real-time streaming transformer transducer for speech recognition on large-scale dataset. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5904\u20135908."},{"key":"e_1_3_1_10_2","unstructured":"J. Choi Z. Wang S. Venkataramani P. I. J. Chuang V. Srinivasan and K. Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085."},{"key":"e_1_3_1_11_2","first-page":"6351","volume-title":"Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Coucke Alice","year":"2019","unstructured":"Alice Coucke, Mohammed Chlieh, Thibault Gisselbrecht, David Leroy, Mathieu Poumeyrol, and Thibaut Lavril. 2019. Efficient keyword spotting using dilated convolutions and gating. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6351\u20136355."},{"key":"e_1_3_1_12_2","first-page":"800","article-title":"Tensorflow lite micro: Embedded machine learning for tinyml systems","volume":"3","author":"David Robert","year":"2021","unstructured":"Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Tiezhen Wang, et\u00a0al. 2021. Tensorflow lite micro: Embedded machine learning for tinyml systems. Proceedings of Machine Learning and Systems 3 (2021), 800\u2013811.","journal-title":"Proceedings of Machine Learning and Systems"},{"issue":"55","key":"e_1_3_1_13_2","first-page":"1","article-title":"Neural Architecture Search: A Survey","volume":"20","author":"Elsken Thomas","year":"2019","unstructured":"Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural Architecture Search: A Survey. Journal of Machine Learning Research 20, 55 (2019), 1\u201321. Retrieved from http:\/\/jmlr.org\/papers\/v20\/18-598.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/800152.804907"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2019.2905361"},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","unstructured":"S. Han X. Liu H. Mao J. Pu A. Pedram M. A. Horowitz and W. J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News 44 3 (2016) 243\u2013254.","DOI":"10.1145\/3007787.3001163"},{"key":"e_1_3_1_17_2","unstructured":"S. Han H. Mao and W. J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding. arXiv preprint arXiv:1510.00149."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380200104"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_48"},{"key":"e_1_3_1_20_2","first-page":"6381","volume-title":"Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"He Yanzhang","year":"2019","unstructured":"Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, et\u00a0al. 2019. Streaming end-to-end speech recognition for mobile devices. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6381\u20136385."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.155"},{"key":"e_1_3_1_22_2","unstructured":"L. Lai N. Suda and V. Chandra. 2018. Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus. arXiv preprint arXiv:1801.06601."},{"key":"e_1_3_1_23_2","unstructured":"H. F. Langroudi V. Karia T. Pandit and D. Kudithipudi. 2021. Tent: Efficient quantization of neural networks on the tiny edge with tapered fixed point. arXiv preprint arXiv:2104.02233."},{"key":"e_1_3_1_24_2","volume-title":"Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS)","author":"Lin Ji","year":"2021","unstructured":"Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, and Song Han. 2021. MCUNetV2: Memory-efficient patch-based inference for tiny deep learning. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_1_25_2","unstructured":"J. Lin W. M. Chen Y. Lin C. Gan and S. Han. 2020. Mcunet: Tiny deep learning on iot devices. Advances in Neural Information Processing Systems 33 (2020) 11711\u201311722."},{"key":"e_1_3_1_26_2","article-title":"Runtime neural pruning","volume":"30","author":"Lin Ji","year":"2017","unstructured":"Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Runtime neural pruning. Advances in Neural Information Processing Systems 30 (2017), 2181\u20132191.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_27_2","article-title":"MCUNet Model Zoo","author":"Song Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han,","year":"2023","unstructured":"Lin, Ji and Chen, Wei-Ming and Lin, Yujun and Gan, Chuang and Han, Song. 2023. MCUNet Model Zoo. [Online]. Retrieved May 16, 2023 fromhttps:\/\/github.com\/mit-han-lab\/mcunet\/blob\/master\/mcunet\/model_zoo.py","journal-title":"[Online]. Retrieved May 16, 2023 from"},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","unstructured":"H. I. C. Liu M. Brehler M. Ravishankar N. Vasilache B. Vanik and S. Laurenzo. 2022. Tinyiree: An ml execution environment for embedded systems from compilation to deployment. IEEE Micro 42 5 (2022) 9\u201316.","DOI":"10.1109\/MM.2022.3178068"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.298"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00339"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.5555\/98124"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1007\/978-3-319-46493-0_32","volume-title":"Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Part IV","author":"Rastegari Mohammad","year":"2016","unstructured":"Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Part IV. Springer, 525\u2013542."},{"key":"e_1_3_1_34_2","first-page":"326","article-title":"Memory-driven mixed low precision quantization for enabling deep network inference on microcontrollers","volume":"2","author":"Rusci Manuele","year":"2020","unstructured":"Manuele Rusci, Alessandro Capotondi, and Luca Benini. 2020. Memory-driven mixed low precision quantization for enabling deep network inference on microcontrollers. Proceedings of Machine Learning and Systems 2 (2020), 326\u2013335.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","unstructured":"O. Rybakov N. Kononenko N. Subrahmanya M. Visontai and S. Laurenzo. 2020. Streaming keyword spotting on mobile devices. arXiv preprint arXiv:2005.06720.","DOI":"10.21437\/Interspeech.2020-1003"},{"key":"e_1_3_1_36_2","article-title":"STM32F767ZI Datasheet","author":"Ltd. STI","year":"2023","unstructured":"STI Ltd.2023. STM32F767ZI Datasheet. [Online]. Retrieved May 16, 2023 fromhttps:\/\/www.st.com\/en\/microcontrollers-microprocessors\/stm32f767zi.html","journal-title":"[Online]. Retrieved May 16, 2023 from"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00881"},{"key":"e_1_3_1_38_2","first-page":"5864","volume-title":"Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Wang Yiming","year":"2021","unstructured":"Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, and Sanjeev Khudanpur. 2021. Wake word detection with streaming transformers. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5864\u20135868."},{"key":"e_1_3_1_39_2","unstructured":"Y. Xu L. Xie X. Zhang X. Chen B. Shi Q. Tian and H. Xiong. 2020. Latency-aware differentiable neural architecture search. arXiv preprint arXiv:2001.06392."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3369382"},{"key":"e_1_3_1_41_2","unstructured":"S. Zhou Y. Wu Z. Ni X. Zhou H. Wen and Y. Zou. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/2259016.2259044"},{"key":"e_1_3_1_43_2","unstructured":"C. Zhu S. Han H. Mao and W. J. Dally. 2016. Trained ternary quantization. arXiv preprint arXiv:1612.01064."}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3706107","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3706107","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:03Z","timestamp":1750295883000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3706107"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,6]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3706107"],"URL":"https:\/\/doi.org\/10.1145\/3706107","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,6]]},"assertion":[{"value":"2024-06-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}