{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,4]],"date-time":"2026-07-04T16:48:23Z","timestamp":1783183703280,"version":"3.54.6"},"publisher-location":"New York, NY, USA","reference-count":57,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,11,19]]},"DOI":"10.1145\/3772052.3772217","type":"proceedings-article","created":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T16:19:00Z","timestamp":1768321140000},"page":"748-761","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["SneakPeek: Data-Aware Model Selection and Scheduling for Inference Serving on the Edge"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7251-3740","authenticated-orcid":false,"given":"Joel","family":"Wolfrath","sequence":"first","affiliation":[{"name":"University of Minnesota, Minneapolis, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8197-0536","authenticated-orcid":false,"given":"Daniel","family":"Frink","sequence":"additional","affiliation":[{"name":"University of Minnesota, Minneapolis, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9914-2604","authenticated-orcid":false,"given":"Abhishek","family":"Chandra","sequence":"additional","affiliation":[{"name":"University of Minnesota, Minneapolis, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2026,1,13]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3617232.3624849"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3625549.3658688"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Ganesh Ananthanarayanan et al. 2017. Real-Time Video Analytics: The Killer App for Edge Computing. Computer (2017).","DOI":"10.1109\/MC.2017.3641638"},{"key":"e_1_3_2_1_4_1","volume-title":"2024 IEEE\/ACM 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 327\u2013336","author":"Silva Tiago Da","unstructured":"Tiago Da Silva Barros et al. 2024. Scheduling with Fully Compressible Tasks: Application to Deep Learning Inference with Neural Network Compression. In 2024 IEEE\/ACM 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 327\u2013336."},{"key":"e_1_3_2_1_5_1","volume-title":"Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)","author":"Romil","unstructured":"Romil Bhardwaj et al. 2022. Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA, 119\u2013135."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1161\/JAHA.121.023222","article-title":"Real-Time Arrhythmia Detection Using Hybrid Convolutional Neural Networks","volume":"10","author":"Bollepalli S. C.","year":"2021","unstructured":"S. C. Bollepalli, R. K. Sevakula, W. M. Au-Yeung, M. B. Kassab, F. M. Merchant, G. Bazoukis, R. Boyer, E. M. Isselbacher, and A. A. Armoundas. 2021. Real-Time Arrhythmia Detection Using Hybrid Convolutional Neural Networks. J Am Heart Assoc 10, 23 (Dec 2021), e023222.","journal-title":"J Am Heart Assoc"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/0207031"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2020.103949"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4613-0303-9_25"},{"key":"e_1_3_2_1_10_1","unstructured":"Hyeongju Choi Apoorva Beedu Harish Haresamudram and Irfan Essa. 2022. Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition. arXiv:2211.04331 [cs.CV]"},{"key":"e_1_3_2_1_11_1","volume-title":"Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17)","author":"Crankshaw Daniel","year":"2017","unstructured":"Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 613\u2013627."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3353138"},{"key":"e_1_3_2_1_13_1","volume-title":"Webb","author":"Dempster Angus","year":"2021","unstructured":"Angus Dempster, Daniel F. Schmidt, and Geoffrey I. Webb. 2021. MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Series Classification (KDD 21). Association for Computing Machinery, New York, NY, USA, 248\u2013257."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Matthijs Douze Alexandr Guzhva Chengqi Deng Jeff Johnson Gergely Szilvasy Pierre-Emmanuel Mazar\u00e9 Maria Lomeli Lucas Hosseini and Herv\u00e9 J\u00e9gou. 2024. The Faiss library. (2024). arXiv:2401.08281 [cs.LG]","DOI":"10.1109\/TBDATA.2025.3618474"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3304109.3306221"},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management","author":"Fang Zhou","year":"2067","unstructured":"Zhou Fang, Tong Yu, Ole J. Mengshoel, and Rajesh K. Gupta. 2017. QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (Singapore, Singapore) (CIKM 17). Association for Computing Machinery, New York, NY, USA, 2067\u20132070."},{"key":"e_1_3_2_1_17_1","volume-title":"2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society","author":"Feichtenhofer C.","year":"2020","unstructured":"C. Feichtenhofer. 2020. X3D: Expanding Architectures for Efficient Video Recognition. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 200\u2013210."},{"key":"e_1_3_2_1_18_1","volume-title":"Offloading Algorithms for Maximizing Inference Accuracy on Edge Device Under a Time Constraint. CoRR abs\/2112.11413","author":"Fresa Andrea","year":"2021","unstructured":"Andrea Fresa and Jaya Prakash Champati. 2021. Offloading Algorithms for Maximizing Inference Accuracy on Edge Device Under a Time Constraint. CoRR abs\/2112.11413 (2021). arXiv:2112.11413 https:\/\/arxiv.org\/abs\/2112.11413"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","first-page":"110","DOI":"10.2345\/0899-8205-53.2.110","article-title":"Identifying and Monitoring Respiratory Compromise: Report from the Rules and Algorithms Working Group","volume":"53","author":"Friedman Bruce","year":"2019","unstructured":"Bruce Friedman, Daniel Fuckert, Mary Jahrsdoerfer, Rochelle Magness, Emily S. Patterson, Rehman Syed, and John R. Zaleski. 2019. Identifying and Monitoring Respiratory Compromise: Report from the Rules and Algorithms Working Group. Biomedical Instrumentation & Technology 53, 2 (2019), 110\u2013123.","journal-title":"Biomedical Instrumentation & Technology"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1161\/01.CIR.101.23.e215"},{"key":"e_1_3_2_1_21_1","unstructured":"Margherita Grandini Enrico Bagli and Giorgio Visani. 2020. Metrics for Multi-Class Classification: an Overview. arXiv:2008.05756 [stat.ML] https:\/\/arxiv.org\/abs\/2008.05756"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526173"},{"key":"e_1_3_2_1_23_1","volume-title":"2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","author":"Halpern M.","unstructured":"M. Halpern, B. Boroujerdian, T. Mummert, E. Duesterwald, and V. Janapa Reddi. 2019. One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). Los Alamitos, CA, USA."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2024.3430063"},{"key":"e_1_3_2_1_25_1","volume-title":"Sarhan","author":"Hassan Mohammed K.","year":"2019","unstructured":"Mohammed K. Hassan, Ali I. El Desouky, Sally M. Elghamrawy, and Amany M. Sarhan. 2019. Big Data Challenges and Opportunities in Healthcare Informatics and Smart Hospitals. Springer International Publishing, Cham."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00059"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403212"},{"key":"e_1_3_2_1_28_1","volume-title":"MOSEL: Inference Serving Using Dynamic Modality Selection. arXiv:2310.18481 [cs.LG]","author":"Hu Bodun","year":"2023","unstructured":"Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, and Aditya Akella. 2023. MOSEL: Inference Serving Using Dynamic Modality Selection. arXiv:2310.18481 [cs.LG]"},{"key":"e_1_3_2_1_29_1","volume-title":"Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication","author":"Junchen","unstructured":"Junchen Jiang et al. 2018. Chameleon: Scalable Adaptation of Video Analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM '18). Association for Computing Machinery, New York, NY, USA, 253\u2013266."},{"key":"e_1_3_2_1_30_1","volume-title":"Daeyoun Kang, Dohyeun Kim, Daeyoung Kim, and Young-Hak Kim.","author":"Jun Tae Joon","year":"2018","unstructured":"Tae Joon Jun, Hoang Minh Nguyen, Daeyoun Kang, Dohyeun Kim, Daeyoung Kim, and Young-Hak Kim. 2018. ECG arrhythmia classification using a 2-D convolutional neural network. CoRR abs\/1804.06812 (2018). http:\/\/arxiv.org\/abs\/1804.06812"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3037697.3037698"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00019"},{"key":"e_1_3_2_1_33_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV).","author":"Kong Quan","year":"2019","unstructured":"Quan Kong, Ziming Wu, Ziwei Deng, Martin Klinkigt, Bin Tong, and Tomokazu Murakami. 2019. MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589974"},{"key":"e_1_3_2_1_35_1","unstructured":"ChonLam Lao Jiaqi Gao Ganesh Ananthanarayanan Aditya Akella and Minlan Yu. 2024. HawkVision: Low-Latency Modeless Edge AI Serving. arXiv:2405.19213 [eess.SY] https:\/\/arxiv.org\/abs\/2405.19213"},{"key":"e_1_3_2_1_36_1","volume-title":"Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing","author":"Li Baolin","year":"2023","unstructured":"Baolin Li, Siddharth Samsi, Vijay Gadepally, and Devesh Tiwari. 2023. Kairos: Building Cost-Efficient Machine Learning Inference Systems with Heterogeneous Cloud Resources. In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (Orlando, FL, USA) (HPDC '23). Association for Computing Machinery, 3\u201316."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3625549.3658654"},{"key":"e_1_3_2_1_38_1","article-title":"Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection","volume":"19","author":"Marco Vicent Sanz","year":"2020","unstructured":"Vicent Sanz Marco, Ben Taylor, Zheng Wang, and Yehia Elkhatib. 2020. Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection. ACM Trans. Embed. Comput. Syst. 19, 1, Article 2 (feb 2020), 28 pages.","journal-title":"ACM Trans. Embed. Comput. Syst."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3437984.3458837"},{"key":"e_1_3_2_1_40_1","volume-title":"Proceedings of the Nineteenth European Conference on Computer Systems (EuroSys '24)","author":"Mendoza Daniel","year":"2024","unstructured":"Daniel Mendoza, Francisco Romero, and Caroline Trippel. 2024. Model Selection for Latency-Critical Inference Serving. In Proceedings of the Nineteenth European Conference on Computer Systems (EuroSys '24). Association for Computing Machinery, New York, NY, USA, 1016\u20131038."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTSS55097.2022.00032"},{"key":"e_1_3_2_1_42_1","volume-title":"LayerCake: Efficient Inference Serving with Cloud and Mobile Resources. In 2023 23nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid).","author":"Ogden Samuel","year":"2023","unstructured":"Samuel Ogden and Tian Guo. 2023. LayerCake: Efficient Inference Serving with Cloud and Mobile Resources. In 2023 23nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)."},{"key":"e_1_3_2_1_43_1","volume-title":"Olatunji and Chun-Hung Cheng","author":"Iyiola","year":"2019","unstructured":"Iyiola E. Olatunji and Chun-Hung Cheng. 2019. Video Analytics for Visual Surveillance and Applications: An Overview and Survey. Springer."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1109\/RBME.2020.3008792","article-title":"Machine Learning for Predicting Epileptic Seizures Using EEG Signals","volume":"14","author":"Khansa Rasheed","year":"2021","unstructured":"Khansa Rasheed et al. 2021. Machine Learning for Predicting Epileptic Seizures Using EEG Signals: A Review. IEEE Reviews in Biomedical Engineering 14 (2021), 139\u2013155.","journal-title":"A Review. IEEE Reviews in Biomedical Engineering"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3642970.3655833"},{"key":"e_1_3_2_1_46_1","volume-title":"INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Romero Francisco","year":"2021","unstructured":"Francisco Romero, Qian Li, Neeraja J. Yadwadkar, and Christos Kozyrakis. 2021. INFaaS: Automated Model-less Inference Serving. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 397\u2013411."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460352"},{"key":"e_1_3_2_1_48_1","volume-title":"On-demand Edge Inference Scheduling with Accuracy and Deadline Guarantee. In 2023 IEEE\/ACM 31st International Symposium on Quality of Service (IWQoS). 1\u201310","author":"Yechao","unstructured":"Yechao She et al. 2023. On-demand Edge Inference Scheduling with Accuracy and Deadline Guarantee. In 2023 IEEE\/ACM 31st International Symposium on Quality of Service (IWQoS). 1\u201310."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","unstructured":"Mohammad Khubeb Siddiqui et al. 2020. A review of epileptic seizure detection using machine learning classifiers. Brain Informatics 7 1 (25 May 2020) 5. https:\/\/doi.org\/10.1186\/s40708-020-00105-1","DOI":"10.1186\/s40708-020-00105-1"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.nlposs-1.9"},{"key":"e_1_3_2_1_51_1","volume-title":"Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing","author":"Achilleas","unstructured":"Achilleas Tzenetopoulos et al. 2024. Seamless HW-accelerated AI serving in heterogeneous MEC Systems with AI@EDGE. In Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing (Pisa, Italy) (HPDC '24). Association for Computing Machinery, New York, NY, USA, 377\u2013380."},{"key":"e_1_3_2_1_52_1","volume-title":"Proceedings of the 32nd Workshop on Network and Operating Systems Support for Digital Audio and Video. Association for Computing Machinery","author":"Xuezhi","unstructured":"Xuezhi Wang et al. 2022. Dynamic DNN Model Selection and Inference off Loading for Video Analytics with Edge-Cloud Collaboration. In Proceedings of the 32nd Workshop on Network and Operating Systems Support for Digital Audio and Video. Association for Computing Machinery, New York, NY, USA, 64\u201370."},{"key":"e_1_3_2_1_53_1","volume-title":"Speech Commands: A public dataset for single-word speech recognition.","author":"Warden Pete","year":"2017","unstructured":"Pete Warden. 2017. Speech Commands: A public dataset for single-word speech recognition. (2017)."},{"key":"e_1_3_2_1_54_1","volume-title":"Leveraging Multi-Modal Data for Efficient Edge Inference Serving. In 2024 IEEE\/ACM 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 408\u2013417","author":"Wolfrath Joel","year":"2024","unstructured":"Joel Wolfrath, Anirudh Achanta, and Abhishek Chandra. 2024. Leveraging Multi-Modal Data for Efficient Edge Inference Serving. In 2024 IEEE\/ACM 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 408\u2013417."},{"key":"e_1_3_2_1_55_1","volume-title":"SLO-Aware Machine Learning Inference Serving. In 2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Zhang Chengliang","year":"2019","unstructured":"Chengliang Zhang, Minchen Yu, Wei Wang, and Feng Yan. 2019. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 1049\u20131062."},{"key":"e_1_3_2_1_56_1","volume-title":"Octopus: SLO-Aware Progressive Inference Serving via Deep Reinforcement Learning in Multi-tenant Edge Cluster","author":"Zhang Ziyang","year":"2023","unstructured":"Ziyang Zhang, Yang Zhao, and Jie Liu. 2023. Octopus: SLO-Aware Progressive Inference Serving via Deep Reinforcement Learning in Multi-tenant Edge Cluster. In Service-Oriented Computing, Flavia Monti, Stefanie Rinderle-Ma, Antonio Ruiz Cort\u00e9s, Zibin Zheng, and Massimo Mecella (Eds.). Springer Nature Switzerland, Cham, 242\u2013258."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"crossref","first-page":"5870","DOI":"10.1109\/TMC.2022.3189186","article-title":"EdgeAdaptor: Online Configuration Adaption, Model Selection and Resource Provisioning for Edge DNN Inference Serving at Scale","volume":"22","author":"Kongyange Zhao","year":"2023","unstructured":"Kongyange Zhao et al. 2023. EdgeAdaptor: Online Configuration Adaption, Model Selection and Resource Provisioning for Edge DNN Inference Serving at Scale. IEEE Transactions on Mobile Computing 22, 10 (2023), 5870\u20135886.","journal-title":"IEEE Transactions on Mobile Computing"}],"event":{"name":"SoCC '25: ACM Symposium on Cloud Computing","location":"Online USA","acronym":"SoCC '25","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems","SIGMOD ACM Special Interest Group on Management of Data"]},"container-title":["Proceedings of the 2025 ACM Symposium on Cloud Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3772052.3772217","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T16:23:10Z","timestamp":1768321390000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3772052.3772217"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,19]]},"references-count":57,"alternative-id":["10.1145\/3772052.3772217","10.1145\/3772052"],"URL":"https:\/\/doi.org\/10.1145\/3772052.3772217","relation":{},"subject":[],"published":{"date-parts":[[2025,11,19]]},"assertion":[{"value":"2026-01-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}