{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T02:33:40Z","timestamp":1778726020395,"version":"3.51.4"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,5,11]],"date-time":"2024-05-11T00:00:00Z","timestamp":1715385600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"U.S. Army Research Laboratory and the University of Maryland Baltimore County","award":["W911NF2120076"],"award-info":[{"award-number":["W911NF2120076"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2024,5,31]]},"abstract":"<jats:p>\n            Fine-tuning deep neural networks is pivotal for creating inference modules that can be suitably imported to edge or field-programmable gate array (FPGA) platforms. Traditionally, exploration of different parameters throughout the layers of deep neural networks has been done using grid search and other brute force techniques. Although these methods lead to the optimal choice of network parameters, the search process can be very time consuming and may not consider deployment constraints across different target platforms. This work addresses this problem by proposing Reg-Tune, a regression-based profiling approach to quickly determine the trend of different metrics in relation to hardware deployment of neural networks on tinyML platforms like FPGAs and edge devices. We start by training a handful of configurations belonging to different combinations of\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\mathcal {NN}\\scriptstyle \\langle q (quantization),\\,s (scaling)\\rangle \\displaystyle\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            or\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\mathcal {NN}\\scriptstyle \\langle r (resolution),\\,s\\rangle \\displaystyle\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            workloads to generate the accuracy values respectively for their corresponding application. Next, we deploy these configurations on the target device to generate energy\/latency values. According to our hypothesis, the most energy-efficient configuration suitable for deployment on the target device is a function of the variables\n            <jats:italic>q<\/jats:italic>\n            ,\n            <jats:italic>r<\/jats:italic>\n            , and\n            <jats:italic>s<\/jats:italic>\n            . Finally, these trained and deployed configurations and their related results are used as data points for polynomial regression with the variables\n            <jats:italic>q<\/jats:italic>\n            ,\n            <jats:italic>r<\/jats:italic>\n            , and\n            <jats:italic>s<\/jats:italic>\n            to realize the trend for accuracy\/energy\/latency on the target device. Our setup allows us to choose the near-optimal energy-consuming or latency-driven configuration for the desired accuracy from the contour profiles of energy\/latency across different tinyML device platforms. To this extent, we demonstrate the profiling process for three different case studies and across two platforms for energy and latency fine-tuning. Our approach results in at least 5.7\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            better energy efficiency when compared to recent implementations for human activity recognition on FPGA and 74.6% reduction in latency for semantic segmentation of aerial imagery on edge devices compared to baseline deployments.\n          <\/jats:p>\n          <jats:p\/>","DOI":"10.1145\/3623380","type":"journal-article","created":{"date-parts":[[2023,9,8]],"date-time":"2023-09-08T12:12:12Z","timestamp":1694175132000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Reg-Tune: A Regression-Focused Fine-Tuning Approach for Profiling Low Energy Consumption and Latency"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9550-7917","authenticated-orcid":false,"given":"Arnab Neelim","family":"Mazumder","sequence":"first","affiliation":[{"name":"University of Maryland Baltimore County, Baltimore, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6905-0781","authenticated-orcid":false,"given":"Farshad","family":"Safavi","sequence":"additional","affiliation":[{"name":"University of Maryland Baltimore County, Baltimore, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9358-2836","authenticated-orcid":false,"given":"Maryam","family":"Rahnemoonfar","sequence":"additional","affiliation":[{"name":"Lehigh University, Bethlehem, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5551-2124","authenticated-orcid":false,"given":"Tinoosh","family":"Mohsenin","sequence":"additional","affiliation":[{"name":"Johns Hopkins University, Baltimore, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,5,11]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"NVIDIA. 2021. Jetson Nano Developer Kit. (April2021). Retrieved September 15 2023 from https:\/\/developer.nvidia.com\/embedded\/jetson-nano-developer-kit"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373087.3375334"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2017.7966166"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2018.2799821"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-13105-4_14"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2019.00012"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-021-00356-5"},{"key":"e_1_3_1_10_2","first-page":"73","volume-title":"Proceedings of the 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201918)","author":"Colangelo Philip","year":"2018","unstructured":"Philip Colangelo, Nasibeh Nasiri, Eriko Nurvitadhi, Asit Mishra, Martin Margala, and Kevin Nealis. 2018. Exploration of low numeric precision deep learning inference using Intel\u00ae FPGAs. In Proceedings of the 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201918). IEEE, Los Alamitos, CA, 73\u201380."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/DSD51259.2020.00057"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/DCOSS54816.2022.00061"},{"key":"e_1_3_1_13_2","article-title":"A reliable and low latency synchronizing middleware for co-simulation of a heterogeneous multi-robot systems","author":"Dey Emon","year":"2022","unstructured":"Emon Dey, Mikolaj Walczak, Mohammad Saeid Anwar, and Nirmalya Roy. 2022. A reliable and low latency synchronizing middleware for co-simulation of a heterogeneous multi-robot systems. arXiv preprint arXiv:2211.05359 (2022).","journal-title":"arXiv preprint arXiv:2211.05359"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITCA52113.2020.00106"},{"key":"e_1_3_1_15_2","unstructured":"GitHub. 2019. Dusty-NV\/Jetson-inference: Hello AI World Guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson. Retrieved September 15 2023 from https:\/\/github.com\/dusty-nv\/jetson-inference"},{"key":"e_1_3_1_16_2","unstructured":"Forhan Bin Emdad Shuyuan Mary Ho Benhur Ravuri and Shezin Hussain. 2023. Towards a unified utilitarian ethics framework for healthcare artificial intelligence. In Proceedings of the 2023 Americas Conference on Information Systems (AMCIS\u201923)."},{"key":"e_1_3_1_17_2","first-page":"128","article-title":"Towards interpretable multimodal predictive models for early mortality prediction of hemorrhagic stroke patients","volume":"2023","author":"Emdad Forhan Bin","year":"2023","unstructured":"Forhan Bin Emdad, Shubo Tian, Esha Nandy, Karim Hanna, and Zhe He. 2023. Towards interpretable multimodal predictive models for early mortality prediction of hemorrhagic stroke patients. AMIA Summits on Translational Science Proceedings 2023 (2023), 128.","journal-title":"AMIA Summits on Translational Science Proceedings"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2900084"},{"key":"e_1_3_1_19_2","unstructured":"Yuanduo Hong Huihui Pan Weichao Sun and Yisong Jia. 2021. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv:2101.06085 (2021)."},{"key":"e_1_3_1_20_2","volume-title":"Proceedings of the Hardware Aware Efficient Training Workshop of ICLR 2021","author":"Hosseini Morteza","year":"2021","unstructured":"Morteza Hosseini, Mohammad Ebrahimabadi, Amab Neelim Mazumder, Houman Homayoun, and Tinoosh Mohsenin. 2021. A fast method to fine-tune neural networks for the least energy consumption on FPGAs. In Proceedings of the Hardware Aware Efficient Training Workshop of ICLR 2021."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSSC.2021.3111431"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2021.3110250"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3423136"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2021.3127932"},{"key":"e_1_3_1_25_2","article-title":"MobileNets: Efficient convolutional neural networks for mobile vision applications","volume":"1704","author":"Howard Andrew G.","year":"2017","unstructured":"Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs\/1704.04861 (2017). http:\/\/arxiv.org\/abs\/1704.04861","journal-title":"CoRR"},{"key":"e_1_3_1_26_2","volume-title":"Advances in Neural Information Processing Systems","author":"Hubara Itay","year":"2016","unstructured":"Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, Red Hook, NY, 1\u20139. https:\/\/proceedings.neurips.cc\/paper\/2016\/file\/d8330f857a17c53d217014ee776bfd50-Paper.pdf"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2018.2848647"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2926381"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477034"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/IAEAC47372.2019.8997842"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/OJCAS.2020.3043737"},{"key":"e_1_3_1_33_2","unstructured":"Tejaswini Manjunath Mozhgan Navardi Prakhar Dixit Bharat Prakash and Tinoosh Mohsenin. 2023. ReProHRL: Towards multi-goal navigation in the real world using hierarchical agents. arXiv:cs.RO\/2308.08737 (2023)."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2021.3129415"},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","unstructured":"Arnab Neelim Mazumder Niall Lyons Ashutosh Pandey Avik Santra and Tinoosh Mohsenin. 2023. Harnessing the power of explanations for incremental training: A LIME-based approach. arXiv:cs.LG\/2211.01413 (2023).","DOI":"10.23919\/EUSIPCO58844.2023.10289904"},{"key":"e_1_3_1_36_2","article-title":"A fast network exploration strategy to profile low energy consumption for keyword spotting","volume":"2202","author":"Mazumder Arnab Neelim","year":"2022","unstructured":"Arnab Neelim Mazumder and Tinoosh Mohsenin. 2022. A fast network exploration strategy to profile low energy consumption for keyword spotting. CoRR abs\/2202.02361 (2022). https:\/\/arxiv.org\/abs\/2202.02361","journal-title":"CoRR"},{"key":"e_1_3_1_37_2","unstructured":"Mozhgan Navardi Prakhar Dixit Tejaswini Manjunath Nicholas R. Waytowich Tinoosh Mohsenin and Tim Oates. 2022. Toward Real-World Implementation of Deep Reinforcement Learning for Vision-Based Autonomous Drone Navigation with Mission. Retrieved September 15 2023 from https:\/\/sim2real.github.io\/assets\/papers\/2022\/navardi.pdf"},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","unstructured":"Mozhgan Navardi and Tinoosh Mohsenin. 2023. MLAE2: Metareasoning for latency-aware energy-efficient autonomous nano-drones. In Proceedings of the 2023 IEEE International Symposium on Circuits and Systems (ISCAS\u201923).","DOI":"10.1109\/ISCAS46773.2023.10181715"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/AICAS54282.2022.9869975"},{"key":"e_1_3_1_40_2","article-title":"Towards an interpretable hierarchical agent framework using semantic goals","author":"Prakash Bharat","year":"2022","unstructured":"Bharat Prakash, Nicholas Waytowich, Tim Oates, and Tinoosh Mohsenin. 2022. Towards an interpretable hierarchical agent framework using semantic goals. In Proceedings of the 10th Annual AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction (AI-HRI\u201922).","journal-title":"Proceedings of the 10th Annual AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction (AI-HRI\u201922)."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.23919\/FPL.2017.8056850"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-96756-7_10"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3090981"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3595633"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISWC.2012.13"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_1_47_2","doi-asserted-by":"crossref","unstructured":"Olaf Ronneberger Philipp Fischer and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. arXiv:cs.CV\/1505.04597 (2015).","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783720"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR56361.2022.9956211"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00293"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics8111321"},{"key":"e_1_3_1_53_2","unstructured":"Changqian Yu Changxin Gao Jingbo Wang Gang Yu Chunhua Shen and Nong Sang. 2020. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation. arXiv:cs.CV\/2004.02147 (2020)."},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.3390\/app8040504"},{"key":"e_1_3_1_55_2","unstructured":"Shuchang Zhou Yuxin Wu Zekun Ni Xinjy Zhou He Wen and Yuheng Zou. 2018. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:cs.NE\/1606.06160 (2018)."},{"key":"e_1_3_1_56_2","doi-asserted-by":"crossref","unstructured":"Barret Zoph Vijay Vasudevan Jonathon Shlens and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. arXiv:cs.CV\/1707.07012 (2018).","DOI":"10.1109\/CVPR.2018.00907"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3623380","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3623380","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:26Z","timestamp":1750178186000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3623380"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,11]]},"references-count":55,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,5,31]]}},"alternative-id":["10.1145\/3623380"],"URL":"https:\/\/doi.org\/10.1145\/3623380","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,11]]},"assertion":[{"value":"2022-10-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-04","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}