{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T11:25:48Z","timestamp":1780053948867,"version":"3.54.0"},"reference-count":81,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2020,1,31]],"date-time":"2020-01-31T00:00:00Z","timestamp":1580428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/M015734\/1,EP\/M01567X\/1"],"award-info":[{"award-number":["EP\/M015734\/1,EP\/M01567X\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2020,1,31]]},"abstract":"<jats:p>Deep neural networks (DNNs) are becoming a key enabling technique for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long inferencing time and resource requirements of many DNNs. Offloading computation into the cloud is often unacceptable due to privacy concerns, high latency, or the lack of connectivity. Although compression algorithms often succeed in reducing inferencing times, they come at the cost of reduced accuracy.<\/jats:p>\n          <jats:p>This article presents a new, alternative approach to enable efficient execution of DNNs on embedded devices. Our approach dynamically determines which DNN to use for a given input by considering the desired accuracy and inference time. It employs machine learning to develop a low-cost predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this first by offline training a predictive model and then using the learned model to select a DNN model to use for new, unseen inputs. We apply our approach to two representative DNN domains: image classification and machine translation. We evaluate our approach on a Jetson TX2 embedded deep learning platform and consider a range of influential DNN models including convolutional and recurrent neural networks. For image classification, we achieve a 1.8x reduction in inference time with a 7.52% improvement in accuracy over the most capable single DNN model. For machine translation, we achieve a 1.34x reduction in inference time over the most capable single model with little impact on the quality of translation.<\/jats:p>","DOI":"10.1145\/3371154","type":"journal-article","created":{"date-parts":[[2020,2,7]],"date-time":"2020-02-07T07:03:59Z","timestamp":1581059039000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":66,"title":["Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection"],"prefix":"10.1145","volume":"19","author":[{"given":"Vicent Sanz","family":"Marco","sequence":"first","affiliation":[{"name":"Osaka University, Japan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1089-6886","authenticated-orcid":false,"given":"Ben","family":"Taylor","sequence":"additional","affiliation":[{"name":"Lancaster University, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zheng","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Leeds, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yehia","family":"Elkhatib","sequence":"additional","affiliation":[{"name":"Lancaster University, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,2,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"J. J. Allaire Dirk Eddelbuettel Nick Golding and Yuan Tang. 2016. TensorFlow for R. Available at https:\/\/tensorflow.rstudio.com\/.  J. J. Allaire Dirk Eddelbuettel Nick Golding and Yuan Tang. 2016. TensorFlow for R. Available at https:\/\/tensorflow.rstudio.com\/."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of ICML\u201916","author":"Amodei Dario","year":"2016"},{"key":"e_1_2_1_3_1","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.  Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473."},{"key":"e_1_2_1_4_1","unstructured":"Jiawang Bai Yiming Li Jiawei Li Yong Jiang and Shutao Xia. 2019. Rectified decision trees: Towards interpretability compression and empirical soundness. arxiv:1903.05965.  Jiawang Bai Yiming Li Jiawei Li Yong Jiang and Shutao Xia. 2019. Rectified decision trees: Towards interpretability compression and empirical soundness. arxiv:1903.05965."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of SenSys\u201916","author":"Bhattacharya Sourav"},{"key":"e_1_2_1_6_1","unstructured":"Alfredo Canziani Adam Paszke and Eugenio Culurciello. 2016. An analysis of deep neural network models for practical applications. arXiv:1605.07678.  Alfredo Canziani Adam Paszke and Eugenio Culurciello. 2016. An analysis of deep neural network models for practical applications. arXiv:1605.07678."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-019-00646-x"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCC\/SmartCity\/DSS.2018.00116"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of ICML\u201915","author":"Chen Wenlin","year":"2015"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.24"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2017.7863731"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541941"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of ICML\u201914","author":"Donahue Jeff","year":"2014"},{"key":"e_1_2_1_14_1","volume-title":"Embracing Global Computing in Emerging Economies. Communications in Computer and Information Science","author":"Elkhatib Yehia"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIC.2017.35"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of CGO\u201913","author":"Emani Murali Krishna"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2737999"},{"key":"e_1_2_1_18_1","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun etal 2016. Google\u2019s Neural Machine Translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.  Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun et al. 2016. Google\u2019s Neural Machine Translation system: Bridging the gap between human and machine translation. arXiv:1609.08144."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3131895"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of HiPEAC\u201911","author":"Grewe Dominik"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of LCPC\u201913","author":"Grewe Dominik"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of CGO\u201913","author":"O\u2019Boyle Michael F. P.","year":"2013"},{"key":"e_1_2_1_23_1","unstructured":"Tian Guo. 2017. Towards efficient deep inference for mobile applications. arXiv:1707.04610.  Tian Guo. 2017. Towards efficient deep inference for mobile applications. arXiv:1707.04610."},{"key":"e_1_2_1_24_1","volume-title":"Dally","author":"Han Song","year":"2015"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of NIPS\u201915","author":"Han Song"},{"key":"e_1_2_1_26_1","volume-title":"Dally","author":"Han Song","year":"2016"},{"key":"e_1_2_1_27_1","volume-title":"Aly Amin Abdelmgeid, and Hammam A. Alshazly","author":"Hassaballah M.","year":"2016"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"e_1_2_1_30_1","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861.  Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3081333.3081360"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of ICML\u201915","author":"Ioffe Sergey","year":"2015"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/E17-2068"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037698"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the ALTA\u201906 Workshop.","author":"Khoo Anthony","year":"2006"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882.  Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882.","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_2_1_38_1","unstructured":"Aaron Klein Stefan Falkner Simon Bartels Philipp Hennig and Frank Hutter. 2016. Fast Bayesian optimization of machine learning hyperparameters on large datasets. arXiv:1605.07079.  Aaron Klein Stefan Falkner Simon Bartels Philipp Hennig and Frank Hutter. 2016. Fast Bayesian optimization of machine learning hyperparameters on large datasets. arXiv:1605.07079."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2018.2381129"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2973801"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the ALTA\u201912 Workshop.","author":"Lui Marco","year":"2012"},{"key":"e_1_2_1_43_1","volume-title":"Retrieved","author":"Luong Minh-Thang","year":"2017"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3110025.3110075"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3135974.3135984"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3135974.3135984"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126555"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of LCPC\u201914","author":"Ogilvie William F.","year":"2014"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2017.7863744"},{"key":"e_1_2_1_50_1","volume-title":"Sina Sajadmanesh, Ali Taheri, Kleomenis Katevas, Hamid R. Rabiee, Nicholas D. Lane, and Hamed Haddadi.","author":"Ossia Seyed Ali","year":"2017"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.29.41"},{"key":"e_1_2_1_52_1","volume-title":"S. Karthikeyan, Ramesh Govindan, B. S. Manjunath, and Rahul Urgaonkar.","author":"Rallapalli Swati","year":"2016"},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Mohammad Rastegari Vicente Ordonez Joseph Redmon and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv:1603.05279.  Mohammad Rastegari Vicente Ordonez Joseph Redmon and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv:1603.05279.","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_2_1_54_1","unstructured":"Sujith Ravi. 2015. ProjectionNet: Learning efficient on-device deep networks using neural projections. arXiv:1708.00630.  Sujith Ravi. 2015. ProjectionNet: Learning efficient on-device deep networks using neural projections. arXiv:1708.00630."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of INFOCOM\u201917","author":"Jie"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3281411.3281422"},{"key":"e_1_2_1_57_1","doi-asserted-by":"crossref","unstructured":"Sandra Servia Rodr\u00edguez Liang Wang Jianxin R. Zhao Richard Mortier and Hamed Haddadi. 2017. Privacy-preserving personal model training. arXiv:1703.00380.  Sandra Servia Rodr\u00edguez Liang Wang Jianxin R. Zhao Richard Mortier and Hamed Haddadi. 2017. Privacy-preserving personal model training. arXiv:1703.00380.","DOI":"10.1109\/IoTDI.2018.00024"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_1_59_1","volume-title":"Proceedings of NOMS\u201916","author":"Faiza"},{"key":"e_1_2_1_60_1","volume-title":"Blair","author":"Samreen Faiza","year":"2019"},{"key":"e_1_2_1_61_1","doi-asserted-by":"crossref","unstructured":"Danielle Saunders Felix Stahlberg Adria de Gispert and Bill Byrne. 2018. Multi-representation ensembles and delayed SGD updates improve syntax-based NMT. arXiv:1805.00456.  Danielle Saunders Felix Stahlberg Adria de Gispert and Bill Byrne. 2018. Multi-representation ensembles and delayed SGD updates improve syntax-based NMT. arXiv:1805.00456.","DOI":"10.18653\/v1\/P18-2051"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.5555\/1390681.1390693"},{"key":"e_1_2_1_63_1","volume-title":"Retrieved","author":"Silberman Nathan","year":"2013"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.52"},{"key":"e_1_2_1_65_1","doi-asserted-by":"crossref","unstructured":"Felix Stahlberg Adria de Gispert and Bill Byrne. 2018. The University of Cambridge\u2019s machine translation systems for WMT18. arXiv:1808.09465.  Felix Stahlberg Adria de Gispert and Bill Byrne. 2018. The University of Cambridge\u2019s machine translation systems for WMT18. arXiv:1808.09465.","DOI":"10.18653\/v1\/W18-6427"},{"key":"e_1_2_1_66_1","volume-title":"Proceedings of NIPS\u201914","author":"Sun Yi","year":"2014"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078633.3081040"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211332.3211336"},{"key":"e_1_2_1_69_1","volume-title":"Proceedings of ICDCS\u201917","author":"Teerapittayanon Surat"},{"key":"e_1_2_1_70_1","volume-title":"Proceedings of PLDI\u201909","author":"Tournavitis Georgios"},{"key":"e_1_2_1_71_1","volume-title":"2015 Tenth Workshop on Statistical Machine Translation.","author":"EMNLP","year":"2015"},{"key":"e_1_2_1_72_1","volume-title":"Yehia Elkhatib, Amir Hussain, and Ala Al-Fuqaha.","author":"Usama Muhammad","year":"2017"},{"key":"e_1_2_1_73_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. arXiv:1706.03762.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. arXiv:1706.03762."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/2677036"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/2579561"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2018.2817118"},{"key":"e_1_2_1_77_1","volume-title":"Proceedings of PPoPP\u201909","author":"Wang Zheng"},{"key":"e_1_2_1_78_1","volume-title":"Proceedings of PACT\u201910","author":"Wang Zheng"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/2512436"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241570"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2018.00061"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3371154","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3371154","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:05:44Z","timestamp":1750273544000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3371154"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,31]]},"references-count":81,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1,31]]}},"alternative-id":["10.1145\/3371154"],"URL":"https:\/\/doi.org\/10.1145\/3371154","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,31]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-02-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}