{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T14:36:46Z","timestamp":1773931006279,"version":"3.50.1"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2022,9,30]],"date-time":"2022-09-30T00:00:00Z","timestamp":1664496000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"crossref"}]},{"name":"System Level Design (SLD) thrust"},{"name":"Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>\n            <jats:bold>Machine learning (ML)<\/jats:bold>\n            on resource-constrained edge devices is expensive and often requires offloading computation to the cloud, which may compromise the privacy of user data. In contrast, the type of data processed at edge devices is user-specific and limited to a few inference classes. In this work, we explore building smaller, user-specific machine learning models, rather than utilizing a generic, compute-intensive machine learning model that caters to a diverse range of users. We first present a hardware-friendly, lightweight pruning technique to create user-specific models directly on mobile platforms, while simultaneously executing inferences. The proposed technique leverages compute sharing between pruning and inference, customizes the backward pass of training, and chooses a pruning granularity for efficient processing on edge. We then propose architectural support to prune user-specific models on a systolic edge ML inference accelerator. We demonstrate that user-specific models provide a speedup of 2.9\u00d7 and 2.3\u00d7 on the mobile CPUs for the ResNet-50 and Inception-V3 models.\n          <\/jats:p>","DOI":"10.1145\/3524125","type":"journal-article","created":{"date-parts":[[2022,3,31]],"date-time":"2022-03-31T12:06:24Z","timestamp":1648728384000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Hardware-friendly User-specific Machine Learning for Edge Devices"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5008-3049","authenticated-orcid":false,"given":"Vidushi","family":"Goyal","sequence":"first","affiliation":[{"name":"University of Michigan, Ann Arbor, Michigan, USA"}]},{"given":"Reetuparna","family":"Das","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, Michigan, USA"}]},{"given":"Valeria","family":"Bertacco","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, Michigan, USA"}]}],"member":"320","published-online":{"date-parts":[[2022,10,8]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"[n.d.]. Edge Tpu. https:\/\/cloud.google.com\/edge-tpu."},{"key":"e_1_3_1_3_2","unstructured":"[n.d.]. Edge TPU Performance Benchmarks. https:\/\/coral.ai\/docs\/edgetpu\/benchmarks\/."},{"key":"e_1_3_1_4_2","unstructured":"[n.d.]. Intel Image Classification: Image Scene Classification of Multiclass. https:\/\/www.kaggle.com\/puneet6060\/intel-image-classification\/version\/2."},{"key":"e_1_3_1_5_2","unstructured":"[n.d.]. iPhone 12 Pro Specifications. https:\/\/www.apple.com\/iphone-12-pro\/."},{"key":"e_1_3_1_6_2","unstructured":"[n.d.]. What is the NPU in Galaxy and What Does It Do?https:\/\/www.samsung.com\/global\/galaxy\/what-is\/npu\/."},{"key":"e_1_3_1_7_2","unstructured":"[n.d.]. XNNPACK. https:\/\/github.com\/google\/XNNPACK."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/tnet.2020.2983119"},{"key":"e_1_3_1_9_2","article-title":"N2N learning: Network to network compression via policy gradient reinforcement learning","author":"Ashok Anubhav","year":"2017","unstructured":"Anubhav Ashok, Nicholas Rhinehart, Fares Beainy, and Kris M. Kitani. 2017. N2N learning: Network to network compression via policy gradient reinforcement learning. arXiv preprint arXiv:1709.06030 (2017).","journal-title":"arXiv preprint arXiv:1709.06030"},{"key":"e_1_3_1_10_2","article-title":"Towards federated learning at scale: \u2018System design","author":"Bonawitz Keith","year":"2019","unstructured":"Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Kone\u010dny, Stefano Mazzocchi, H. Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards federated learning at scale: \u2018System design. arXiv preprint arXiv:1902.01046 (2019).","journal-title":"arXiv preprint arXiv:1902.01046"},{"key":"e_1_3_1_11_2","article-title":"\\( EVA^{2} \\) : Exploiting temporal redundancy in live computer vision","author":"Buckler Mark","year":"2018","unstructured":"Mark Buckler, Philip Bedoukian, Suren Jayasuriya, and Adrian Sampson. 2018. \\( EVA^{2} \\) : Exploiting temporal redundancy in live computer vision. arXiv preprint arXiv:1803.06312 (2018).","journal-title":"arXiv preprint arXiv:1803.06312"},{"key":"e_1_3_1_12_2","first-page":"678","volume-title":"International Conference on Machine Learning","author":"Cai Han","year":"2018","unstructured":"Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. 2018. Path-level network transformation for efficient architecture search. In International Conference on Machine Learning. PMLR, 678\u2013687."},{"key":"e_1_3_1_13_2","article-title":"Net2Net: Accelerating learning via knowledge transfer","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen, Ian Goodfellow, and Jonathon Shlens. 2015. Net2Net: Accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641 (2015).","journal-title":"arXiv preprint arXiv:1511.05641"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.40"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_16_2","first-page":"1","volume-title":"2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)","author":"Fowers Jeremy","year":"2018","unstructured":"Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et\u00a0al. 2018. A configurable cloud-scale DNN processor for real-time AI. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1\u201314."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00070"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414655"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00161"},{"key":"e_1_3_1_20_2","volume-title":"Proc IISWC","author":"Hadidi Ramyad","year":"2019","unstructured":"Ramyad Hadidi et\u00a0al. 2019. Characterizing the deployment of deep neural networks on commercial edge devices. In Proc IISWC."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021745"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969239.2969366"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2020.2976585"},{"key":"e_1_3_1_24_2","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. arxiv:1603.05027 [cs.CV]"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_48"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.155"},{"key":"e_1_3_1_27_2","article-title":"Distilling the knowledge in a neural network","author":"Hinton Geoffrey","year":"2015","unstructured":"Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).","journal-title":"arXiv preprint arXiv:1503.02531"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303949"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358263"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11036-017-0962-2"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00070"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3093336.3037698"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TWC.2020.2996144"},{"key":"e_1_3_1_35_2","article-title":"Federated learning: Strategies for improving communication efficiency","author":"Kone\u010dn\u1ef3 Jakub","year":"2016","unstructured":"Jakub Kone\u010dn\u1ef3, H. Brendan McMahan, Felix X. Yu, Peter Richt\u00e1rik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).","journal-title":"arXiv preprint arXiv:1610.05492"},{"key":"e_1_3_1_36_2","article-title":"Pruning filters for efficient ConvNets","author":"Li Hao","year":"2016","unstructured":"Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient ConvNets. arXiv preprint arXiv:1608.08710 (2016).","journal-title":"arXiv preprint arXiv:1608.08710"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173191"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSE.2018.2848960"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.541"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356156"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00069"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2974843"},{"key":"e_1_3_1_43_2","article-title":"Deep learning recommendation model for personalization and recommendation systems","author":"Naumov Maxim","year":"2019","unstructured":"Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, et\u00a0al. 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019).","journal-title":"arXiv preprint arXiv:1906.00091"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378534"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080254"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00016"},{"key":"e_1_3_1_47_2","article-title":"FitNets: Hints for thin deep nets","author":"Romero Adriana","year":"2014","unstructured":"Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. FitNets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).","journal-title":"arXiv preprint arXiv:1412.6550"},{"key":"e_1_3_1_48_2","article-title":"SCALE-Sim: Systolic CNN accelerator simulator","author":"Samajdar Ananda","year":"2018","unstructured":"Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. SCALE-Sim: Systolic CNN accelerator simulator. arXiv preprint arXiv:1811.02883 (2018).","journal-title":"arXiv preprint arXiv:1811.02883"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2017.43"},{"key":"e_1_3_1_50_2","first-page":"1","volume-title":"2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)","author":"Samuel Neev","year":"2017","unstructured":"Neev Samuel, Tzvi Diskin, and Ami Wiesel. 2017. Deep MIMO detection. In 2017 IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 1\u20135."},{"key":"e_1_3_1_51_2","article-title":"Two-stream convolutional networks for action recognition in videos","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014).","journal-title":"arXiv preprint arXiv:1406.2199"},{"key":"e_1_3_1_52_2","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).","journal-title":"arXiv preprint arXiv:1409.1556"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00018"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2017.2761740"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2019.2944584"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00029"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/3410463.3414654"},{"key":"e_1_3_1_60_2","article-title":"FixyNN: Efficient hardware for mobile computer vision via transfer learning","author":"Whatmough Paul N.","year":"2019","unstructured":"Paul N. Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas Kolala Venkataramanaiah, Jae-sun Seo, and Matthew Mattina. 2019. FixyNN: Efficient hardware for mobile computer vision via transfer learning. arXiv preprint arXiv:1902.11128 (2019).","journal-title":"arXiv preprint arXiv:1902.11128"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00048"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/LWC.2017.2757490"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.5555\/2969033.2969197"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3140659.3080215"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00958"},{"key":"e_1_3_1_66_2","first-page":"292","volume-title":"2019 ACM\/IEEE 46th Annual International Symposium on Computer Architecture (ISCA)","author":"Zhang Jiaqi","year":"2019","unstructured":"Jiaqi Zhang, Xiangru Chen, Mingcong Song, and Tao Li. 2019. Eager pruning: Algorithm and architecture support for fast training of deep neural networks. In 2019 ACM\/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 292\u2013303."},{"key":"e_1_3_1_67_2","article-title":"Euphrates: Algorithm-SoC Co-design for low-power mobile continuous vision","author":"Zhu Yuhao","year":"2018","unstructured":"Yuhao Zhu, Anand Samajdar, Matthew Mattina, and Paul Whatmough. 2018. Euphrates: Algorithm-SoC Co-design for low-power mobile continuous vision. arXiv preprint arXiv:1803.11232 (2018).","journal-title":"arXiv preprint arXiv:1803.11232"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524125","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524125","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:05Z","timestamp":1750188665000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524125"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,30]]},"references-count":66,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3524125"],"URL":"https:\/\/doi.org\/10.1145\/3524125","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,30]]},"assertion":[{"value":"2021-07-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-04","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-10-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}