{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T08:49:57Z","timestamp":1776847797484,"version":"3.51.2"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"DOI":"10.13039\/100000001","name":"U.S. National Science Foundation","doi-asserted-by":"crossref","award":["CNS-2336886"],"award-info":[{"award-number":["CNS-2336886"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Auton. Adapt. Syst."],"published-print":{"date-parts":[[2026,6,30]]},"abstract":"<jats:p>Power capping is an important technique for high-density servers to safely oversubscribe the power infrastructure in a data center. However, power capping is commonly accomplished by dynamically lowering the server processors\u2019 frequency levels, which can result in degraded application performance. For servers that run important machine learning (ML) applications with Service-Level Objective (SLO) requirements, inference performance such as recognition accuracy must be optimized within a certain latency constraint, which demands high server performance. To achieve the best inference accuracy under the desired latency and server power constraints, this article proposes OptimML, a multi-input-multi-output (MIMO) control framework that jointly controls both inference latency and server power consumption, by flexibly adjusting the ML model size (and so its required computing resources) when server frequency needs to be lowered for power capping. Our results on a hardware testbed with widely adopted ML framework (including PyTorch, TensorFlow, and MXNet) show that OptimML achieves higher inference accuracy compared with several well-designed baselines, while respecting both latency and power constraints. Furthermore, an adaptive control scheme with online model switching and estimation is designed to achieve analytic assurance of control accuracy and system stability, even in the face of significant workload or hardware variations.<\/jats:p>","DOI":"10.1145\/3661825","type":"journal-article","created":{"date-parts":[[2024,5,7]],"date-time":"2024-05-07T11:06:41Z","timestamp":1715080001000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["OptimML: Joint Control of Inference Latency and Server Power Consumption for ML Performance Optimization"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-0995-3721","authenticated-orcid":false,"given":"Guoyu","family":"Chen","sequence":"first","affiliation":[{"name":"The Ohio State University, Columbus, OH, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9633-1418","authenticated-orcid":false,"given":"Xiaorui","family":"Wang","sequence":"additional","affiliation":[{"name":"The Ohio State University, Columbus, OH, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,4,22]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"1","volume-title":"57th ACM\/IEEE Design Automation Conference (DAC\u201920). IEEE","author":"Abdelfattah Mohamed S.","year":"2020","unstructured":"Mohamed S. Abdelfattah, \u0141ukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, and Nicholas D. Lane. 2020. Best of both worlds: Automl codesign of a cnn and its hardware accelerator. In 57th ACM\/IEEE Design Automation Conference (DAC\u201920). IEEE, 1\u20136."},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","unstructured":"Shekhar Borkar and Andrew A. Chien. 2011. The future of microprocessors. Communications of the ACM 54 5 (2011) 67\u201377 .","DOI":"10.1145\/1941487.1941507"},{"key":"e_1_3_1_4_2","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel Ziegler Jeffrey Wu Clemens Winter Chris Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS) Vol. 33. Curran Associates Inc. 1877\u20131901."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.14778\/3364324.3364325"},{"key":"e_1_3_1_6_2","volume-title":"Proceedings of the 25th International Conference on Supercomputing (ICS)","author":"Chen Ming","year":"2011","unstructured":"Ming Chen, Xiaorui Wang, and Xue Li. 2011. Coordinating processor and main memory for efficient server power control. In Proceedings of the 25th International Conference on Supercomputing (ICS)."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.327"},{"key":"e_1_3_1_8_2","unstructured":"CIFAR 2021. The CIFAR-10 and CIFAR-100 Datasets Website. Retrieved from https:\/\/www.cs.toronto.edu\/ kriz\/cifar.html"},{"key":"e_1_3_1_9_2","unstructured":"Colab 2024. The Colab Website. Retrieved from https:\/\/colab.research.google.com\/"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","unstructured":"Emily L. Denton Wojciech Zaremba Joan Bruna Yann LeCun and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems (NeurIPS) Vol. 27. 1\u20139. Retrieved from 10.5555\/2968826.2968968","DOI":"10.5555\/2968826.2968968"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/1250662.1250699"},{"key":"e_1_3_1_12_2","unstructured":"Peter Dorato Vito Cerone and Chaouki Abdallah. 1994. Linear-Quadratic Control: An Introduction. Simon and Schuster Inc. New York NY."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2000064.2000108"},{"key":"e_1_3_1_14_2","unstructured":"Gene F. Franklin J. David Powell and Michael Workman. 1998. Digital Control of Dynamic Systems. Ellis-Kagle Press Half Moon Bay CA."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/1998582.1998589"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304011"},{"key":"e_1_3_1_18_2","volume-title":"Proceedings of the 18th ACM\/IFIP\/USENIX Middleware Conference","author":"Gujarati Arpan","year":"2017","unstructured":"Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S. McKinley, and Bj\u00f6rn B. Brandenburg. 2017. Swayam: Distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency. In Proceedings of the 18th ACM\/IFIP\/USENIX Middleware Conference."},{"key":"e_1_3_1_19_2","volume-title":"Fourth International Conference on Learning Representations (ICLR\u201916)","author":"Han Song","year":"2016","unstructured":"Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. In Fourth International Conference on Learning Representations (ICLR\u201916). Yoshua Bengio and Yann LeCun (Eds.), Conference Track Proceedings. Retrieved from http:\/\/arxiv.org\/abs\/1510.00149"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_21_2","doi-asserted-by":"crossref","unstructured":"Joseph L. Hellerstein Yixin Diao Sujay Parekh and Dawn M. Tilbury. 2004. Feedback Control of Computing Systems. John Wiley and Sons Inc. Hoboken NJ.","DOI":"10.1002\/047166880X"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","unstructured":"Sunpyo Hong and Hyesoon Kim. 2010. An integrated GPU power and performance model. In 37th Annual International Symposium on Computer Architecture (ISCA\u201910). 280\u2013289. DOI: 10.1145\/1815961.1815998","DOI":"10.1145\/1815961.1815998"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","unstructured":"M. Horowitz E. Alon D. Patil S. Naffziger Rajesh Kumar and K. Bernstein. 2005. Scaling power and the future of CMOS. In IEEE International Electron Devices Meeting (IEDM\u201905). 7\u201315. DOI: 10.1109\/IEDM.2005.1609253","DOI":"10.1109\/IEDM.2005.1609253"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33013812"},{"key":"e_1_3_1_25_2","volume-title":"Sixth International Conference on Learning Representations (ICLR\u201918)","author":"Huang Gao","year":"2018","unstructured":"Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Q. Weinberger. 2018. Multi-scale dense networks for resource efficient image classification. In Sixth International Conference on Learning Representations (ICLR\u201918). Conference Track Proceedings. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=Hk2aImxAb"},{"key":"e_1_3_1_26_2","first-page":"187","article-title":"Quantized neural networks: Training neural networks with low precision weights and activations","volume":"18","author":"Hubara Itay","year":"2018","unstructured":"Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2018. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 187 (2018), 1\u201330.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_27_2","first-page":"687","volume-title":"IFAC Proceedings Volumes","volume":"17","author":"Kulhav\u00fd Rudolf","year":"1984","unstructured":"Rudolf Kulhav\u00fd and Miroslav K\u00e1rn\u00fd. 1984. Tracking of slowly varying parameters by directional forgetting. IFAC Proceedings Volumes 17, 2 (1984), 687\u2013692."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"key":"e_1_3_1_29_2","first-page":"1","volume-title":"Third International Conference on Learning Representations (ICLR\u201915)","author":"Lebedev Vadim","year":"2015","unstructured":"Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky. 2015. Speeding-up convolutional neural networks using fine-tuned Cp-decomposition. In Third International Conference on Learning Representations (ICLR\u201915). 1\u201311."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICAC.2007.35"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3229556.3229562"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","unstructured":"Jinhua Lin Lin Ma and Yu Yao. 2019. A Fourier domain acceleration framework for convolutional neural networks. Neurocomputing 364 (2019) 254\u2013268. DOI: 10.1016\/j.neucom.2019.06.080","DOI":"10.1016\/j.neucom.2019.06.080"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","unstructured":"Chun Liu Anand Sivasubramaniam Mahmut Kandemir and Mary Jane Irwin. 2005. Exploiting barriers to optimize power consumption of CMPs. In 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS). 1\u201310. DOI: 10.1109\/IPDPS.2005.211","DOI":"10.1109\/IPDPS.2005.211"},{"key":"e_1_3_1_34_2","first-page":"3792","volume-title":"46th IEEE Conference on Decision and Control","author":"Liu Xue","year":"2007","unstructured":"Xue Liu, Xiaoyun Zhu, Pradeep Padala, Zhikui Wang, and Sharad Singhal. 2007. Optimal multivariate control for differentiated services on a shared hosting platform. In 46th IEEE Conference on Decision and Control (2007). 3792\u20133799."},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","unstructured":"David Lo Liqun Cheng Rama Govindaraju Luiz Andr\u00e9 Barroso and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In ACM\/IEEE 41st International Symposium on Computer Architecture (ISCA\u201914). 301\u2013312.","DOI":"10.1109\/ISCA.2014.6853237"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2012.31"},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","unstructured":"Arnab Neelim Mazumder Jian Meng Hasib-Al Rashid Utteja Kallakuri Xin Zhang Jae-Sun Seo and Tinoosh Mohsenin. 2021. A survey on the optimization of neural network accelerators for micro-ai on-device inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 11 4 (2021) 532\u2013547.","DOI":"10.1109\/JETCAS.2021.3129415"},{"key":"e_1_3_1_38_2","unstructured":"Model Slicing 2021. The Model Slicing Github Website. Retrieved from https:\/\/github.com\/ooibc88\/modelslicing"},{"key":"e_1_3_1_39_2","unstructured":"OpenAI. 2023. GPT-4 technical report. arxiv:2303.08774."},{"key":"e_1_3_1_40_2","first-page":"5113","volume-title":"36th Proceedings of International Conference on Machine Learning Research (PMLR\u201919)","volume":"97","author":"Peng Hanyu","year":"2019","unstructured":"Hanyu Peng, Jiaxiang Wu, Shifeng Chen, and Junzhou Huang. 2019. Collaborative channel pruning for deep networks. In 36th Proceedings of International Conference on Machine Learning Research (PMLR\u201919). Vol. 97. 5113\u20135122."},{"key":"e_1_3_1_41_2","volume-title":"Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)","author":"Raghavendra Ramya","year":"2008","unstructured":"Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No power struggles: Coordinated multi-level power management for the data center. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2006.20"},{"key":"e_1_3_1_43_2","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1007\/978-3-030-66770-2_22","volume-title":"IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning: Second International Workshop, IoT Streams 2020, and First International Workshop, ITEM 2020, Co-located with ECML\/PKDD 2020","author":"Rusci Manuele","year":"2020","unstructured":"Manuele Rusci, Marco Fariselli, Alessandro Capotondi, and Luca Benini. 2020. Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers. In IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning: Second International Workshop, IoT Streams 2020, and First International Workshop, ITEM 2020, Co-located with ECML\/PKDD 2020. Revised Selected Papers 2. Springer, 296\u2013308."},{"key":"e_1_3_1_44_2","doi-asserted-by":"crossref","first-page":"3067","DOI":"10.1145\/3543507.3583437","volume-title":"Proceedings of the ACM Web Conference 2023 (Austin, TX, USA) (WWW \u201923)","author":"Savasci Mehmet","year":"2023","unstructured":"Mehmet Savasci, Ahmed Ali-Eldin, Johan Eker, Anders Robertsson, and Prashant Shenoy. 2023. DDPC: Automated data-driven power-performance controller design on-the-fly for latency-sensitive web services. In Proceedings of the ACM Web Conference 2023 (Austin, TX, USA) (WWW \u201923). Association for Computing Machinery, 3067\u20133076."},{"key":"e_1_3_1_45_2","volume-title":"5th International Conference on Learning Representations (ICLR\u201917)","author":"Shazeer Noam","year":"2017","unstructured":"Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V. Le, Geoffrey E. Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In 5th International Conference on Learning Representations (ICLR\u201917). Conference Track Proceedings. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=B1ckMDqlg"},{"key":"e_1_3_1_46_2","unstructured":"Tensorflow Lite. 2024. The TensorFlow Lite Website. Retrieved from https:\/\/www.tensorflow.org\/lite"},{"key":"e_1_3_1_47_2","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 30. Curran Associates, Inc."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCECE51280.2021.9342343"},{"key":"e_1_3_1_49_2","volume-title":"Proceedings of the 14th IEEE International Symposium on High-Performance Computer Architecture (HPCA)","author":"Wang Xiaorui","year":"2008","unstructured":"Xiaorui Wang and Ming Chen. 2008. Cluster-level feedback power control for performance optimization. In Proceedings of the 14th IEEE International Symposium on High-Performance Computer Architecture (HPCA)."},{"key":"e_1_3_1_50_2","volume-title":"Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT)","author":"Wang Xiaorui","year":"2009","unstructured":"Xiaorui Wang, Ming Chen, Charles Lefurgy, and Tom W. Keller. 2009a. SHIP: Scalable hierarchical power control for large-scale data centers. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"key":"e_1_3_1_51_2","volume-title":"Proceedings of the 15th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)","author":"Wang Xiaorui","year":"2009","unstructured":"Xiaorui Wang, Xing Fu, Xue Liu, and Zonghua Gu. 2009b. Power-aware CPU utilization control for distributed real-time systems. In Proceedings of the 15th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_25"},{"key":"e_1_3_1_53_2","first-page":"1","volume-title":"32nd Conference on Neural Information Processing Systems (NIPS;18)","author":"Wang Yue","year":"2018","unstructured":"Yue Wang, Tan Nguyen, Yang Zhao, Zhangyang Wang, Yingyan Lin, and Richard Baraniuk. 2018a. EnergyNet: Energy-efficient dynamic inference. In 32nd Conference on Neural Information Processing Systems (NIPS;18). 1\u20135."},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS.2008.20"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.643"},{"key":"e_1_3_1_56_2","first-page":"1","volume-title":"8th International Conference on Learning Representations (ICLR\u201920)","author":"You Haoran","year":"2020","unstructured":"Haoran You, Chaojian Li, Pengfei Xu, Yonggan Fu, Yue Wang, Richard G. Baraniuk, and Yingyan Lin. 2020. Drawing early-bird tickets: Towards more efficient training of deep networks. In 8th International Conference on Learning Representations (ICLR\u201920). 1\u201313."},{"key":"e_1_3_1_57_2","doi-asserted-by":"crossref","unstructured":"P. C. Young and J. C. Willems. 1972. An approach to the linear multivariable servomechanism problem. International Journal of Control 15 5 (1972) 961\u2013979.","DOI":"10.1080\/00207177208932211"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","unstructured":"Qianru Zhang Meng Zhang Tinghuan Chen Zhifei Sun Yuzhe Ma and Bei Yu. 2019. Recent advances in convolutional neural network acceleration. Neurocomputing 323 (2019) 37\u201351. DOI: 10.1016\/j.neucom.2018.09.038","DOI":"10.1016\/j.neucom.2018.09.038"}],"container-title":["ACM Transactions on Autonomous and Adaptive Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3661825","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T07:52:02Z","timestamp":1776844322000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3661825"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,22]]},"references-count":57,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,6,30]]}},"alternative-id":["10.1145\/3661825"],"URL":"https:\/\/doi.org\/10.1145\/3661825","relation":{},"ISSN":["1556-4665","1556-4703"],"issn-type":[{"value":"1556-4665","type":"print"},{"value":"1556-4703","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,4,22]]},"assertion":[{"value":"2023-09-28","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-22","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-04-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}