{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,18]],"date-time":"2026-05-18T11:26:18Z","timestamp":1779103578626,"version":"3.51.4"},"reference-count":147,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,6,1]],"date-time":"2025-06-01T00:00:00Z","timestamp":1748736000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,6]],"date-time":"2025-07-06T00:00:00Z","timestamp":1751760000000},"content-version":"vor","delay-in-days":35,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"US DEVCOM Army Research Labs","award":["W911NF- 17-2-0196"],"award-info":[{"award-number":["W911NF- 17-2-0196"]}]},{"name":"US DEVCOM Army Research Labs","award":["W911NF- 17-2-0196"],"award-info":[{"award-number":["W911NF- 17-2-0196"]}]},{"name":"US DEVCOM Army Research Labs","award":["W911NF- 17-2-0196"],"award-info":[{"award-number":["W911NF- 17-2-0196"]}]},{"name":"US DEVCOM Army Research Labs","award":["W911NF- 17-2-0196"],"award-info":[{"award-number":["W911NF- 17-2-0196"]}]},{"name":"US DEVCOM Army Research Labs","award":["W911NF- 17-2-0196"],"award-info":[{"award-number":["W911NF- 17-2-0196"]}]},{"name":"US DEVCOM Army Research Labs","award":["W911NF- 17-2-0196"],"award-info":[{"award-number":["W911NF- 17-2-0196"]}]},{"name":"US DEVCOM Army Research Labs","award":["W911NF- 17-2-0196"],"award-info":[{"award-number":["W911NF- 17-2-0196"]}]},{"name":"NSF","award":["20-38817"],"award-info":[{"award-number":["20-38817"]}]},{"name":"NSF","award":["20-38817"],"award-info":[{"award-number":["20-38817"]}]},{"name":"NSF","award":["20-38817"],"award-info":[{"award-number":["20-38817"]}]},{"name":"NSF","award":["20-38817"],"award-info":[{"award-number":["20-38817"]}]},{"name":"NSF","award":["20-38817"],"award-info":[{"award-number":["20-38817"]}]},{"name":"NSF","award":["20-38817"],"award-info":[{"award-number":["20-38817"]}]},{"name":"Boeing Inc."},{"name":"ACE, one of the seven centers in JUMP 2.0, a Semi-conductor Research Corporation (SRC) program sponsored by DARPA"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Real-Time Syst"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Recent advances in AI culminate a shift in science and engineering away from strong reliance on algorithmic and symbolic knowledge towards new data-driven approaches. How does the emerging intelligent data-centric world impact research on real-time and embedded computing? We argue for two effects: (1) new challenges in embedded system contexts, and (2) new opportunities for community expansion beyond the embedded domain. First, <jats:italic>on the embedded system side<\/jats:italic>, the shifting nature of computing towards <jats:italic>data-centricity<\/jats:italic> affects the types of bottlenecks that arise. At training time, the bottlenecks are generally <jats:italic>data-related<\/jats:italic>. Embedded computing relies on <jats:italic>scarce<\/jats:italic> sensor data modalities, unlike those commonly addressed in mainstream AI, necessitating solutions for <jats:italic>efficient learning<\/jats:italic> from scarce sensor data. At inference time, the bottlenecks are <jats:italic>resource-related<\/jats:italic>, calling for <jats:italic>improved resource economy<\/jats:italic> and <jats:italic>novel scheduling policies<\/jats:italic>. Further ahead, the convergence of AI around large language models (LLMs) introduces additional <jats:italic>model-related<\/jats:italic> challenges in embedded contexts. Second, <jats:italic>on the domain expansion side<\/jats:italic>, we argue that community expertise in handling resource bottlenecks is becoming increasingly relevant to a new domain: the <jats:italic>cloud<\/jats:italic> environment, driven by AI needs. The paper discusses the novel research directions that arise in the data-centric world of AI, covering data-, resource-, and model-related challenges in embedded systems as well as new opportunities in the cloud domain.<\/jats:p>","DOI":"10.1007\/s11241-025-09452-w","type":"journal-article","created":{"date-parts":[[2025,7,6]],"date-time":"2025-07-06T02:28:19Z","timestamp":1751768899000},"page":"185-236","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["The bottlenecks of AI: challenges for embedded and real-time research in a data-centric age"],"prefix":"10.1007","volume":"61","author":[{"given":"Tarek","family":"Abdelzaher","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yigong","family":"Hu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Denizhan","family":"Kara","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tomoyoshi","family":"Kimura","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ashitabh","family":"Misra","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vishakha","family":"Ramani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Olivier","family":"Tardieu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianshi","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maggie","family":"Wigness","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alaa","family":"Youssef","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,7,6]]},"reference":[{"issue":"3","key":"9452_CR2","doi-asserted-by":"publisher","first-page":"348","DOI":"10.1007\/s11241-023-09395-0","volume":"59","author":"T Abdelzaher","year":"2023","unstructured":"Abdelzaher T, Agrawal K, Baruah S, Burns A, Davis RI, Guo Z, Hu Y (2023) Scheduling idk classifiers with arbitrary dependences to minimize the expected time to successful classification. Real-Time Syst 59(3):348\u2013407","journal-title":"Real-Time Syst"},{"key":"9452_CR3","unstructured":"Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, Almeida D, Altenschmidt J, Altman S, Anadkat S (2023) Gpt-4 technical report, arXiv preprint arXiv:2303.08774"},{"key":"9452_CR4","unstructured":"Agrawal A, Kedia N, Panwar A, Mohan J, Kwatra N, Gulavani B, Tumanov A, Ramjee R (2024) Taming $$\\{$$Throughput-Latency$$\\}$$ tradeoff in $$\\{$$LLM$$\\}$$ inference with $$\\{$$Sarathi-Serve$$\\}$$. In: 18th USENIX symposium on operating systems design and implementation (OSDI 24), pp 117\u2013134"},{"key":"9452_CR5","unstructured":"Agrawal A, Kedia N, Panwar A, Mohan J, Kwatra N, Gulavani B, Tumanov A, Ramjee R, (2024) Taming $$\\{$$Throughput-Latency$$\\}$$ tradeoff in $$\\{$$LLM$$\\}$$ inference with $$\\{$$Sarathi-Serve$$\\}$$. In: 18th USENIX symposium on operating systems design and implementation (OSDI 24), pp 117\u2013134"},{"key":"9452_CR6","unstructured":"AI D (2025) Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. [Online]. Available: https:\/\/arxiv.org\/abs\/2501.12948"},{"key":"9452_CR7","unstructured":"Ashkboos S, Verhoef B, Hoefler T, Eleftheriou E, Dazzi M (2024) Efqat: An efficient framework for quantization-aware training, arXiv preprint arXiv:2411.11038"},{"key":"9452_CR1","unstructured":"Bamba-9B-v2 (2025)\u2014Fast and powerful!. https:\/\/huggingface.co\/blog\/ibm-ai-platform\/bamba-9b-v2"},{"key":"9452_CR8","unstructured":"Bandara WGC, Patel N, Gholami A, Nikkhah M, Agrawal M, Patel VM (2022) Adamae: Adaptive masking for efficient spatiotemporal learning with masked autoencoders. In: 2023 IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 14\u00a0507\u201314\u00a0517. [Online]. Available: https:\/\/consensus.app\/papers\/adamae-adaptive-masking-efficient-spatiotemporal-bandara\/a7fbcc1444dc5d6d94e2198544cc52b8\/?utm_source=chatgpt"},{"key":"9452_CR9","unstructured":"Baris O, Chen Y, Dong G, Han L, Kimura T, Quan P, Wang R, Wang T, Abdelzaher T, Berg\u00e9s M et\u00a0al. (2025) Foundation models for cps-iot: Opportunities and challenges, arXiv preprint arXiv:2501.16368"},{"issue":"1","key":"9452_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s11241-022-09383-w","volume":"59","author":"S Baruah","year":"2023","unstructured":"Baruah S, Burns A, Davis RI, Wu Y (2023) Optimally ordering idk classifiers subject to deadlines. Real-Time Syst 59(1):1\u201334","journal-title":"Real-Time Syst"},{"key":"9452_CR11","doi-asserted-by":"crossref","unstructured":"Baruah S, Burns A, Wu Y (2021) Optimal synthesis of idk-cascades. In: Proceedings of the 29th international conference on real-time networks and systems, pp 184\u2013191","DOI":"10.1145\/3453417.3453425"},{"key":"9452_CR12","unstructured":"Bommasani R, Hudson DA, Adeli E, Altman R, Arora S, Arx S\u00a0von, Bernstein MS, Bohg J, Bosselut A, Brunskill E et\u00a0al. (2021) On the opportunities and risks of foundation models,\u2019 arXiv preprint arXiv:2108.07258"},{"key":"9452_CR13","doi-asserted-by":"crossref","unstructured":"Bulat A, Tzimiropoulos G (2021) Bit-mixer: Mixed-precision networks with runtime bit-width selection. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 5188\u20135197","DOI":"10.1109\/ICCV48922.2021.00514"},{"key":"9452_CR14","unstructured":"Burns J, Chang L, Hardware A, Scaling A, Martineau K, Generative A (2022) Meet the IBM artificial intelligence unit. IBM Research. Available: https:\/\/research.ibm.com\/blog\/ibm-artificial-intelligence-unit-aiu"},{"key":"9452_CR15","unstructured":"Chen D, Huang Y, Wu S, Tang J, Chen L, Bai Y, He Z, Wang C, Zhou H, Li Y et\u00a0al. (2024) Gui-world: a dataset for gui-oriented multimodal llm-based agents, CoRR"},{"key":"9452_CR16","unstructured":"Chen D, Youssef A, et\u00a0al. (2024) Transforming the hybrid cloud for emerging ai workloads. Available: https:\/\/arxiv.org\/abs\/2411.13239"},{"key":"9452_CR17","unstructured":"Choi J, Wang Z, Venkataramani S, Chuang P\u00a0I-Jen, Srinivasan V, Gopalakrishnan K (2018) PACT: parameterized clipping activation for quantized neural networks, arXiv e-prints, p. arXiv:1805.06085"},{"issue":"9","key":"9452_CR18","doi-asserted-by":"publisher","first-page":"5728","DOI":"10.1109\/TII.2022.3155656","volume":"18","author":"S De","year":"2022","unstructured":"De S, Bermudez-Edo M, Xu H, Cai Z (2022) Deep generative models in the industrial internet of things: a survey. IEEE Trans Ind Inf 18(9):5728\u20135737","journal-title":"IEEE Trans Ind Inf"},{"key":"9452_CR19","unstructured":"D\u00e9fossez A, Mazar\u00e9 L, Orsini M, Royer A, P\u00e9rez P, J\u00e9gou H, Grave E, Zeghidour N (2024) Moshi: a speech-text foundation model for real-time dialogue. arXiv preprint arXiv:2410.00037"},{"key":"9452_CR20","unstructured":"Devlin J (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805"},{"key":"9452_CR21","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et\u00a0al. (2020) An image is worth 16x16 words: transformers for image recognition at scale,\u2019 arXiv preprint arXiv:2010.11929"},{"key":"9452_CR22","doi-asserted-by":"crossref","unstructured":"Fathullah Y, Wu C, Lakomkin E, Li K, Jia J, Shangguan Y, Mahadeokar J, Kalinli O, Fuegen C, Seltzer M (2024) Audiochatllama: towards general-purpose speech abilities for llms. In: Proceedings of the 2024 Conference of the North American chapter of the association for computational linguistics: human language technologies (Volume 1: Long Papers), pp 5522\u20135532","DOI":"10.18653\/v1\/2024.naacl-long.309"},{"key":"9452_CR23","unstructured":"Fu C, Lin H, Wang X, Zhang Y-F, Shen Y, Liu X, Cao H, Long Z, Gao H, Li K et\u00a0al. (2025) Vita-1.5: Towards gpt-4o level real-time vision and speech interaction, arXiv preprint arXiv:2501.01957"},{"key":"9452_CR24","unstructured":"Fu Y, Zhu S, Su R, Qiao A, Stoica I, Zhang H (2025) Efficient LLM scheduling by learning to rank. In: The thirty-eighth annual conference on neural information processing systems"},{"issue":"2","key":"9452_CR25","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1007\/s11263-023-01891-x","volume":"132","author":"P Gao","year":"2024","unstructured":"Gao P, Geng S, Zhang R, Ma T, Fang R, Zhang Y, Li H, Qiao Y (2024) Clip-adapter: better vision-language models with feature adapters. Int J Comput Vis 132(2):581\u2013595","journal-title":"Int J Comput Vis"},{"key":"9452_CR26","unstructured":"Gao Z, Li L, Xu T (2023) Data augmentation for time-series classification: an extensive empirical study and comprehensive survey, rXiv preprint arXiv:2310.10060"},{"key":"9452_CR27","unstructured":"Gao J, Song X, Wen Q, Wang P, Sun L, Xu H (2020) Robusttad: Robust time series anomaly detection via decomposition and convolutional neural networks, rXiv preprint arXiv:2002.09545"},{"key":"9452_CR28","unstructured":"Ge S, Zhang Y, Liu L, Zhang M, Han J, Gao J (2023) Model tells you what to discard: Adaptive kv cache compression for llms. In: The twelfth international conference on learning representations"},{"key":"9452_CR29","doi-asserted-by":"crossref","unstructured":"Gholami A, Kim S, Dong Z, Yao Z, Mahoney MW, Keutzer K (2022) A survey of quantization methods for efficient neural network inference. In: Low-power computer vision","DOI":"10.1201\/9781003162810-13"},{"key":"9452_CR30","unstructured":"Glorioso P, Anthony Q, Tokpanov Y, Whittington J, Pilault J, Ibrahim A, Millidge B (2024) Zamba: A compact 7b ssm hybrid model. Available: https:\/\/arxiv.org\/abs\/2405.16712"},{"key":"9452_CR31","doi-asserted-by":"crossref","unstructured":"Gokarn I, Sabbella H, Hu Y, Abdelzaher T, Misra A (2023) Mosaic: spatially-multiplexed edge ai optimization over multiple concurrent video sensing streams. In: Proceedings of the 14th conference on ACM multimedia systems, pp. 278\u2013288","DOI":"10.1145\/3587819.3590986"},{"key":"9452_CR32","doi-asserted-by":"crossref","unstructured":"Gong Y, Chung Y-A, Glass J (2021) Ast: Audio spectrogram transformer, arXiv preprint arXiv:2104.01778","DOI":"10.21437\/Interspeech.2021-698"},{"key":"9452_CR33","unstructured":"Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst, 27"},{"key":"9452_CR34","doi-asserted-by":"crossref","unstructured":"Gopinath S, Ghanathe N, Seshadri V, Sharma R (2019) Compiling kb-sized machine learning models to tiny iot devices. In: PLDI. Available: https:\/\/www.microsoft.com\/en-us\/research\/publication\/compiling-kb-sized-machine-learning-models-to-constrained-hardware\/","DOI":"10.1145\/3314221.3314597"},{"key":"9452_CR35","unstructured":"Grattafiori A, Dubey A, Jauhri A, Pandey A, Kadian A, Al-Dahle A, Letman A, Mathur A, Schelten A, Vaughan A et\u00a0al. (2024) The llama 3 herd of models, arXiv preprint arXiv:2407.21783"},{"key":"9452_CR36","unstructured":"Greenewald K, Lastras L, Parnell T, Shah V, Popa L, Zizzo G, Gunasekara C, Rawat A, Cox D (2025) Activated LoRA: Fine-tuned llms for intrinsics, Available: https:\/\/arxiv.org\/abs\/2504.12397"},{"key":"9452_CR37","unstructured":"Gu A, Dao T (2024) Mamba: Linear-time sequence modeling with selective state spaces. Available: https:\/\/arxiv.org\/abs\/2312.00752"},{"issue":"1","key":"9452_CR38","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1145\/1656274.1656278","volume":"11","author":"M Hall","year":"2009","unstructured":"Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10\u201318","journal-title":"ACM SIGKDD Explor Newsl"},{"key":"9452_CR39","unstructured":"Han S, Mao H, Dally WJ (2016) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding"},{"key":"9452_CR40","first-page":"174","volume":"2020","author":"S Heo","year":"2020","unstructured":"Heo S, Cho S, Kim Y, Kim H (2020) Real-time object detection system with multi-path neural networks. IEEE Real-Time Embedded Technol Appl Symp (RTAS) 2020:174\u2013187","journal-title":"IEEE Real-Time Embedded Technol Appl Symp (RTAS)"},{"key":"9452_CR41","unstructured":"Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Available: https:\/\/arxiv.org\/abs\/1503.02531"},{"issue":"6","key":"9452_CR42","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3295748","volume":"51","author":"MZ Hossain","year":"2019","unstructured":"Hossain MZ, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning for image captioning. ACM Comput Surv (CsUR) 51(6):1\u201336","journal-title":"ACM Comput Surv (CsUR)"},{"issue":"2","key":"9452_CR43","first-page":"3","volume":"1","author":"EJ Hu","year":"2022","unstructured":"Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, Wang L, Chen W et al (2022) Lora: Low-rank adaptation of large language models. ICLR 1(2):3","journal-title":"ICLR"},{"issue":"4","key":"9452_CR44","doi-asserted-by":"publisher","first-page":"430","DOI":"10.1007\/s11241-022-09387-6","volume":"58","author":"Y Hu","year":"2022","unstructured":"Hu Y, Liu S, Abdelzaher T, Wigness M, David P (2022) Real-time task scheduling with image resizing for criticality-based machine perception. Real-Time Syst 58(4):430\u2013455","journal-title":"Real-Time Syst"},{"key":"9452_CR45","unstructured":"Huang PY, Xu H, Li J, Baevski A, Auli M, Galuba W, Metze F, Feichtenhofer C (2022) Masked autoencoders that listen, arXiv preprint arXiv:2207.06405"},{"key":"9452_CR46","unstructured":"Huang Y, Cheng Y, Bapna A, Firat O, Chen D, Chen M, Lee H, Ngiam J, Le QV, Wu Y (2019) Gpipe: efficient training of giant neural networks using pipeline parallelism. Available: https:\/\/arxiv.org\/abs\/1811.06965"},{"key":"9452_CR47","doi-asserted-by":"crossref","unstructured":"Huang J, Gao Y, Dong W (2024) Elastic dnn inference with unpredictable exit in edge computing, IEEE Trans Mob Comput","DOI":"10.1109\/TMC.2024.3441946"},{"key":"9452_CR48","doi-asserted-by":"crossref","unstructured":"Huang Y, Sansom J, Ma Z, Gervits F, Chai J (2024) Drivlme: Enhancing llm-based autonomous driving agents with embodied and social experiences. In: 2024 IEEE\/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 3153\u20133160","DOI":"10.1109\/IROS58592.2024.10802555"},{"key":"9452_CR49","doi-asserted-by":"crossref","unstructured":"Hu C, Chen Y, Kara D, Liu S, Abdelzaher T, Wu F, Chen G (2025) Openmae: efficient masked autoencoder for vibration sensing with open-domain data enrichment. In: Proceedings of the ACM on Interactive, mobile, wearable and ubiquitous technologies (ACM IMWUT), and UbiComp","DOI":"10.1145\/3729485"},{"key":"9452_CR50","doi-asserted-by":"crossref","unstructured":"Hu Y, Gokarn I, Liu S, Misra A, Abdelzaher T (2023) Underprovisioned gpus: on sufficient capacity for real-time mission-critical perception. In: 2023 32nd international conference on computer communications and networks (ICCCN). IEEE, pp \u201310","DOI":"10.1109\/ICCCN58024.2023.10230127"},{"key":"9452_CR51","doi-asserted-by":"crossref","unstructured":"Hu Y, Gokarn I, Liu S, Misra A, Abdelzaher T (2024) Algorithms for canvas-based attention scheduling with resizing. In: IEEE 30th Real-Time and embedded technology and applications symposium (RTAS). IEEE 2024:348\u2013359","DOI":"10.1109\/RTAS61025.2024.00035"},{"key":"9452_CR52","unstructured":"Hu C, Huang H, Hu J, Xu J, Chen X, Xie T, Wang C, Wang S, Bao Y, Sun N, Shan Y, (2024) Memserve: Context caching for disaggregated llm serving with elastic memory pool, Available: https:\/\/arxiv.org\/abs\/2406.17565"},{"key":"9452_CR53","doi-asserted-by":"crossref","unstructured":"Huzaifa M, Desai R, Grayson S, Jiang X, Jing Y, Lee J, Lu F, Pang Y, Ravichandran J, Sinclair F, Tian B, Yuan H, Zhang J, Adve S (2022) ILLIXR: An open testbed to enable extended reality systems research. In: IEEE Micro, special issue on the Top Picks from the 2021 Computer Architecture Conferences, vol. 42, issue 4, July-August","DOI":"10.1109\/MM.2022.3161018"},{"key":"9452_CR54","doi-asserted-by":"crossref","unstructured":"Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2704\u20132713","DOI":"10.1109\/CVPR.2018.00286"},{"key":"9452_CR55","doi-asserted-by":"crossref","unstructured":"Jayaram KR, Muthusamy V, Dube P, Ishakian V, Wang C, Herta B, Boag S, Arroyo D, Tantawi A, Verma A, Pollok F, Khalaf R (2019) Ffdl: A flexible multi-tenant deep learning platform. In: Proceedings of the 20th international middleware conference, ser. Middleware \u201919. New York, NY, USA: Association for Computing Machinery, pp 82\u201395. [Online]. Available: https:\/\/doi.org\/10.1145\/3361525.3361538","DOI":"10.1145\/3361525.3361538"},{"key":"9452_CR56","unstructured":"Jeon M, Venkataraman S, Phanishayee A, Qian J, Xiao W, Yang F (2019) Analysis of Large-Scale Multi-Tenant GPU clusters for DNN training workloads. In: 2019 USENIX annual technical conference (USENIX ATC 19). Renton, WA: USENIX Association, pp 947\u2013960. Available: https:\/\/www.usenix.org\/conference\/atc19\/presentation\/jeon"},{"key":"9452_CR57","doi-asserted-by":"crossref","unstructured":"Jin Q, Yang L, Liao Z (2020) Adabits: Neural network quantization with adaptive bit-widths. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 2146\u20132156","DOI":"10.1109\/CVPR42600.2020.00222"},{"key":"9452_CR58","doi-asserted-by":"crossref","unstructured":"Ji M, Yi S, Koo C, Ahn S, Seo D, Dutt N, Kim J-C (2022) Demand layering for real-time dnn inference with minimized memory usage. In: IEEE real-time systems symposium (RTSS). IEEE 2022:291\u2013304","DOI":"10.1109\/RTSS55097.2022.00033"},{"key":"9452_CR59","doi-asserted-by":"crossref","unstructured":"Kang W, Chung S, Kim JY, Lee Y, Lee K, Lee J, Shin KG, Chwa HS (2022) Dnn-sam: Split-and-merge dnn execution for real-time object detection. In: IEEE 28th real-time and embedded technology and applications symposium (RTAS). IEEE 2022:160\u2013172","DOI":"10.1109\/RTAS54340.2022.00021"},{"key":"9452_CR60","doi-asserted-by":"crossref","unstructured":"Kannan SS, Venkatesh VL, Min B-C (2024) Smart-llm: Smart multi-agent robot task planning using large language models. In:2024 IEEE\/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 12\u00a0140\u201312\u00a0147","DOI":"10.1109\/IROS58592.2024.10802322"},{"key":"9452_CR61","first-page":"2795","volume":"2024","author":"D Kara","year":"2024","unstructured":"Kara D, Kimura T, Liu S, Li J, Liu D, Wang T, Wang R, Chen Y, Hu Y, Abdelzaher T (2024) Freqmae: Frequency-aware masked autoencoder for multi-modal iot sensing. Proc ACM Web Conf 2024:2795\u20132806","journal-title":"Proc ACM Web Conf"},{"key":"9452_CR62","doi-asserted-by":"crossref","unstructured":"Kara D, Kimura T, Chen Y, Li J, Wang R, Chen Y, Wang T, Liu S, Abdelzaher T (2024) Phymask: An adaptive masking paradigm for efficient self-supervised learning in iot. In: Proceedings of the 22nd ACM conference on embedded networked sensor systems, pp 97\u2013111","DOI":"10.1145\/3666025.3699325"},{"issue":"1","key":"9452_CR63","doi-asserted-by":"publisher","first-page":"727","DOI":"10.1007\/s11831-022-09815-7","volume":"30","author":"M Khodarahmi","year":"2023","unstructured":"Khodarahmi M, Maihami V (2023) A review on Kalman filter models. Arch Comput Methods Eng 30(1):727\u2013747","journal-title":"Arch Comput Methods Eng"},{"key":"9452_CR64","doi-asserted-by":"crossref","unstructured":"Kim J-E, Bradford R, Shao Z (2020) Anytimenet: Controlling time-quality tradeoffs in deep neural network architectures. In: Design. Automation Test in Europe Conference Exhibition (DATE) 2020:945\u2013950","DOI":"10.23919\/DATE48585.2020.9116280"},{"key":"9452_CR65","doi-asserted-by":"crossref","unstructured":"Kim J-E, Bradford R, Yoon M-K, Shao Z (2020) Abc: Abstract prediction before concreteness. In: Design. Automation Test in Europe Conference Exhibition (DATE) 2020:1103\u20131108","DOI":"10.23919\/DATE48585.2020.9116479"},{"key":"9452_CR66","first-page":"3084","volume":"2025","author":"T Kimura","year":"2025","unstructured":"Kimura T, Li X, Hanna O, Chen Y, Chen Y, Kara D, Wang T, Li J, Ouyang X, Liu S et al (2025) Infomae: Pair-efficient cross-modal alignment for multimodal time-series sensing signals. Proc ACM Web Conf 2025:3084\u20133095","journal-title":"Proc ACM Web Conf"},{"key":"9452_CR67","doi-asserted-by":"crossref","unstructured":"Kimura T, Li J, Wang T, Chen Y, Wang R, Kara D, Wigness M, Bhattacharyya J, Srivatsa M, Liu S, Srivastava M, Diggavi S, Abdelzaher T (2024) Vibrofm: Towards micro foundation models for robust multimodal iot sensing. In: 2024 IEEE 21th international conference on mobile ad hoc and smart systems (MASS), IEEE","DOI":"10.1109\/MASS62177.2024.00014"},{"key":"9452_CR68","doi-asserted-by":"crossref","unstructured":"Kimura T, Misra A, Chen Y, Kara D, Li J, Wang T, Wang R, Bhattacharyya J, Kim J, Shenoy P, Srivastava M, Wigness MW, Abdelzaher T (2024) The case for micro foundation models to support robust edge intelligence. In: IEEE CogMI","DOI":"10.1109\/CogMI62246.2024.00014"},{"key":"9452_CR69","unstructured":"Kingma DP, Welling M (2013) Auto-encoding variational bayes, arXiv preprint arXiv:1312.6114"},{"key":"9452_CR70","doi-asserted-by":"crossref","unstructured":"Kwon W, Li Z, Zhuang S, Sheng Y, Zheng L, Yu CH, Gonzalez J, Zhang H, Stoica I (2023) Efficient memory management for large language model serving with pagedattention. In: Proceedings of the 29th symposium on operating systems principles, pp 611\u2013626","DOI":"10.1145\/3600006.3613165"},{"key":"9452_CR71","unstructured":"Le\u00a0Guennec A, Malinowski S, Tavenard R (2016) Data augmentation for time series classification using convolutional neural networks. In: ECML\/PKDD workshop on advanced analytics and learning on temporal data"},{"key":"9452_CR72","unstructured":"Lee K, Gangidi A, Oldham M (2024) Building meta\u2019s genai infrastructure, https:\/\/engineering.fb.com\/2024\/03\/12\/data-center-engineering\/building-metas-genai-infrastructure\/, accessed: 2025-04-30"},{"key":"9452_CR73","unstructured":"Lee W, Lee J, Seo J, Sim J (2024) $$\\{$$InfiniGen$$\\}$$: Efficient generative inference of large language models with dynamic $$\\{$$KV$$\\}$$ cache management. In: 18th USENIX symposium on operating systems design and implementation (OSDI 24), pp 155\u2013172"},{"key":"9452_CR74","doi-asserted-by":"crossref","unstructured":"Liang J, Zhao C, Wang M, Qiu X, Li L (2021) Finding sparse structures for domain specific neural machine translation. In: Proceedings of the AAAI conference on artificial intelligence, vol.\u00a035, no.\u00a015, pp. 13\u00a0333\u201313\u00a0342","DOI":"10.1609\/aaai.v35i15.17574"},{"key":"9452_CR75","doi-asserted-by":"crossref","unstructured":"Li Y, Li Z, Yang W, Liu C (2023) Rt-lm: Uncertainty-aware resource management for real-time inference of language models. arXiv preprint arXiv:2309.06619","DOI":"10.1109\/RTSS59052.2023.00023"},{"key":"9452_CR76","doi-asserted-by":"crossref","unstructured":"Liu D (2022) Self-supervised learning frameworks for iot applications, Ph.D. dissertation","DOI":"10.1007\/978-3-031-40787-1_2"},{"issue":"2","key":"9452_CR77","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1007\/s11241-023-09396-z","volume":"59","author":"S Liu","year":"2023","unstructured":"Liu S, Fu X, Hu Y, Wigness M, David P, Yao S, Sha L, Abdelzaher T (2023) Generalized self-cueing real-time attention scheduling with intermittent inspection and image resizing. Real-Time Syst 59(2):302\u2013343","journal-title":"Real-Time Syst"},{"key":"9452_CR78","unstructured":"Liu A, Feng B, Xue B, Wang B, Wu B, Lu C, Zhao C, Deng C, Zhang C, Ruan C et\u00a0al. (2024) Deepseek-v3 technical report, arXiv preprint arXiv:2412.19437"},{"key":"9452_CR79","unstructured":"Liu S, Kimura T, Liu D, Wang R, Li J, Diggavi S, Srivastava M, Abdelzaher T (2023) Focal: Contrastive learning for multimodal time-series sensing signals in factorized orthogonal latent space. In: Advances in Neural Information Processing Systems"},{"key":"9452_CR80","doi-asserted-by":"crossref","unstructured":"Liu Z, Wang Y, Han K, Ma S, Gao W (2022) Instance-aware dynamic neural network quantization. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 12\u00a0434\u201312\u00a0443","DOI":"10.1109\/CVPR52688.2022.01211"},{"key":"9452_CR81","doi-asserted-by":"crossref","unstructured":"Liu S, Wang T, Li J, Sun D, Srivastava M, Abdelzaher T (2022) Adamask: enabling machine-centric video streaming with adaptive frame masking for dnn inference offloading. In: Proceedings of the 30th ACM international conference on multimedia, pp 3035\u20133044","DOI":"10.1145\/3503161.3548033"},{"key":"9452_CR82","doi-asserted-by":"crossref","unstructured":"Liu S, Yao S, Fu X, Tabish R, Yu S, Bansal A, Yun H, Sha L, Abdelzaher T (2020) On removing algorithmic priority inversion from mission-critical machine inference pipelines. In: IEEE real-time systems symposium (RTSS). IEEE 2020:319\u2013332","DOI":"10.1109\/RTSS49844.2020.00037"},{"key":"9452_CR83","unstructured":"Liu Z, Yuan J, Jin H, Zhong S, Xu Z, Braverman V, Chen B, Hu X (2024) Kivi: a tuning-free asymmetric 2bit quantization for kv cache. In: Proceedings of the 41st international conference on machine learning, pp 32\u00a0332\u201332\u00a0344"},{"key":"9452_CR84","doi-asserted-by":"crossref","unstructured":"Liu H, Zhu Y, Kato K, Tsukahara A, Kondo I, Aoyama T, Hasegawa Y (2024) Enhancing the llm-based robot manipulation through human-robot collaboration. In: IEEE robotics and automation letters","DOI":"10.1109\/LRA.2024.3415931"},{"key":"9452_CR85","unstructured":"Li B, Zhang Y, Guo D, Zhang R, Li F, Zhang H, Zhang K, Zhang P, Li Y, Liu Z et\u00a0al. (2024) Llava-onevision: Easy visual task transfer, arXiv preprint arXiv:2408.03326"},{"key":"9452_CR86","unstructured":"Li G, Zheng H, Liu D, Su B, Zheng C (2022) Semmae: Semantic-guided masking for learning masked autoencoders, ArXiv, Available: https:\/\/consensus.app\/papers\/semmae-semanticguided-masking-learning-masked-li\/db05155d3afd56958dd95cdf3d84e0d2\/?utm_source=chatgpt"},{"key":"9452_CR87","doi-asserted-by":"crossref","unstructured":"Lohn A, Musser M (2022) Ai and compute: How much longer can computing power drive artificial intelligence progress. In: Center for security and emerging technology, pp. 1\u201311","DOI":"10.51593\/2021CA009"},{"key":"9452_CR88","unstructured":"Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S et\u00a0al. (2020) Language models are few-shot learners, arXiv preprint arXiv:2005.14165, vol.\u00a01, p\u00a03"},{"key":"9452_CR89","doi-asserted-by":"crossref","unstructured":"Misra A, Laurel J, Misailovic S (2023) Vix: Analysis-driven compiler for efficient low-precision variational inference. In: 2023 design, automation and test in Europe conference and exhibition, DATE 2023 - Proceedings, ser. Proceedings -Design, Automation and Test in Europe, DATE. United States: Institute of Electrical and Electronics Engineers Inc.,","DOI":"10.23919\/DATE56975.2023.10137324"},{"key":"9452_CR90","unstructured":"Misra A, Saoda N, Abdelzaher T Latency-constrained input-aware quantization of time series inference workflows at the edge. In: 2025 IEEE International conference on computer communications (INFOCOM). IEEE, in press"},{"key":"9452_CR91","unstructured":"Mo S, Salakhutdinov R, Morency L-P, Liang PP (2024) Iot-lm: Large multisensory language models for the internet of things, arXiv preprint arXiv:2407.09801"},{"key":"9452_CR92","unstructured":"Niizumi D, Takeuchi D, Ohishi Y, Harada N, Kashino K (2022) Masked spectrogram modeling using masked autoencoders for learning general-purpose audio representation, arXiv preprint arXiv:2204.12260"},{"key":"9452_CR93","unstructured":"Paliotta D, Wang J, Pagliardini M, Li KY, Bick A, Kolter JZ, Gu A, Fleuret F, Dao T (2025) Thinking slow, fast: Scaling inference compute with distilled reasoners, Available: https:\/\/arxiv.org\/abs\/2502.20339"},{"key":"9452_CR94","doi-asserted-by":"crossref","unstructured":"Park DS, Chan W, Zhang Y, Chiu C-C, Zoph B, Cubuk ED, Le QV (2019) Specaugment: a simple data augmentation method for automatic speech recognition, arXiv preprint arXiv:1904.08779","DOI":"10.21437\/Interspeech.2019-2680"},{"key":"9452_CR95","doi-asserted-by":"crossref","unstructured":"Pialla G, Devanne M, Weber J, Idoumghar L, Forestier G (2022) Data augmentation for time series classification with deep learning models. In: International workshop on advanced analytics and learning on temporal data. Springer, pp 117\u2013132","DOI":"10.1007\/978-3-031-24378-3_8"},{"key":"9452_CR96","unstructured":"Qiu H, Biswas A, Zhao Z, Mohan J, Khare A, Choukse E, Goiri \u00cd, Zhang Z, Shen H, Bansal C et\u00a0al. (2025) Modserve: Scalable and resource-efficient large multimodal model serving"},{"key":"9452_CR97","unstructured":"Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners"},{"key":"9452_CR98","unstructured":"Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Chen M, Sutskever I (2021) Zero-shot text-to-image generation. In: International conference on machine learning. Pmlr, pp 8821\u20138831"},{"key":"9452_CR99","doi-asserted-by":"crossref","unstructured":"Sharif H, Kotsifakou M, Zhao Y, Kothari A, Schreiber B, Wang E, Sarita Y, Zhao N, Joshi K, Adve V, Misailovic S, Adve S (2021) Approxtuner: A compiler and runtime system for adaptive approximations. In: Proc. principles and practice of parallel programming (PPOPP\u201921)","DOI":"10.1145\/3437801.3446108"},{"key":"9452_CR100","unstructured":"Sheng Y, Cao S, Li D, Hooper C, Lee N, Yang S, Chou C, Zhu B, Zheng L, Keutzer K, Gonzalez JE, Stoica I (2024) S-LoRA: Serving thousands of concurrent lora adapters, Available: https:\/\/arxiv.org\/abs\/2311.03285"},{"key":"9452_CR101","unstructured":"Sheng Y, Zheng L, Yuan B, Li Z, Ryabinin M, Chen B, Liang P, R\u00e9 C, Stoica I, Zhang C (2023) Flexgen: High-throughput generative inference of large language models with a single gpu. In: International conference on machine learning. PMLR pp 31\u00a0094\u201331\u00a0116"},{"key":"9452_CR102","unstructured":"Shkolnik M, Chmiel B, Banner R, Shomron G, Nahshan Y, Bronstein AM, Weiser UC (2020) Robust quantization: One model to rule them all, CoRR, vol. abs\/2002.07686. Available: https:\/\/arxiv.org\/abs\/2002.07686"},{"key":"9452_CR103","unstructured":"Shoeybi M, Patwary M, Puri R, LeGresley P, Casper J, Catanzaro B (2020) Megatron-LM: Training multi-billion parameter language models using model parallelism, Available: https:\/\/arxiv.org\/abs\/1909.08053"},{"key":"9452_CR104","doi-asserted-by":"crossref","unstructured":"Singh R, Huzaifa M, Liu J, Patney A, Sharif H, Zhao Y, Adve S (2023) Power, performance, and image quality tradeoffs in foveated rendering. In: 30th IEEE conference on virtual reality and 3D user interfaces (IEEE VR)","DOI":"10.1109\/VR55154.2023.00036"},{"key":"9452_CR105","unstructured":"Sohl-Dickstein J, Weiss E, Maheswaranathan N, Ganguli S (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning. PMLR, pp 2256\u20132265"},{"key":"9452_CR106","doi-asserted-by":"crossref","unstructured":"Soyyigit A, Yao S, Yun H (2022) Anytime-lidar: deadline-aware 3d object detection. In: 2022 IEEE 28th international conference on embedded and real-time computing systems and applications (RTCSA). IEEE, pp 31\u201340","DOI":"10.1109\/RTCSA55878.2022.00010"},{"key":"9452_CR107","unstructured":"Sun B, Huang Z, Zhao H, Xiao W, Zhang X, Li Y, Lin W (2024) Llumnix: Dynamic scheduling for large language model serving. In: 18th USENIX symposium on operating systems design and implementation (OSDI 24), pp 173\u2013191"},{"key":"9452_CR108","doi-asserted-by":"crossref","unstructured":"Sun X, Panda R, Chen C-FR, Wang N, Pan B, Oliva A, Feris, R Saenko K (2024) Improved techniques for quantizing deep networks with adaptive bit-widths. In: Proceedings of the IEEE\/CVF winter conference on applications of computer vision, pp 957\u2013967","DOI":"10.1109\/WACV57701.2024.00100"},{"key":"9452_CR109","unstructured":"Tang C, Yu W, Sun G, Chen X, Tan T, Li W, Lu L, Zejun M, Zhang C (2023) Salmonn: Towards generic hearing abilities for large language models. In: The twelfth international conference on learning representations"},{"key":"9452_CR110","unstructured":"Team G, Anil R, Borgeaud S, Alayrac J-B, Yu J, Soricut R, Schalkwyk J, Dai AM, Hauth A, Millican K et\u00a0al. (2023) Gemini: a family of highly capable multimodal models, arXiv preprint arXiv:2312.11805"},{"key":"9452_CR111","doi-asserted-by":"crossref","unstructured":"Um TT, Pfister FM, Pichler D, Endo S, Lang M, Hirche S, Fietzek U, Kuli\u0107 D (2017) Data augmentation of wearable sensor data for parkinson\u2019s disease monitoring using convolutional neural networks. In: Proceedings of the 19th ACM international conference on multimodal interaction, pp 216\u2013220","DOI":"10.1145\/3136755.3136817"},{"key":"9452_CR112","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need, Advances in neural information processing systems, 30,"},{"key":"9452_CR113","doi-asserted-by":"crossref","unstructured":"Wan Z, Wu Z, Liu C, Huang J, Zhu Z, Jin P, Wang L, Yuan L (2024) Look-m: Look-once optimization in kv cache for efficient multimodal long-context inference. Findings of the Association for Computational Linguistics: EMNLP 2024:4065\u20134078","DOI":"10.18653\/v1\/2024.findings-emnlp.235"},{"key":"9452_CR114","doi-asserted-by":"crossref","unstructured":"Wang T, Chen Y, Yang Q, Sun D, Wang R, Li J, Kimura T, Abdelzaher T (2024) Data augmentation for human activity recognition via condition space interpolation within a generative model. In: 2024 33rd International conference on computer communications and networks (ICCCN). IEEE, pp 1\u20139","DOI":"10.1109\/ICCCN61486.2024.10637566"},{"key":"9452_CR115","doi-asserted-by":"crossref","unstructured":"Wang T, Kara D, Li J, Liu S, Abdelzaher T, Jalaian B (2022) The methodological pitfall of dataset-driven research on deep learning: An iot example. In: MILCOM 2022-2022 IEEE military communications conference (MILCOM). IEEE, pp 1082\u20131087","DOI":"10.1109\/MILCOM55135.2022.10017612"},{"key":"9452_CR116","unstructured":"Wang Z, Li JB, Qu S, Metze F, Strubell E (2022) Error-aware quantization through noise tempering, arXiv preprint arXiv:2212.05603"},{"key":"9452_CR117","doi-asserted-by":"crossref","unstructured":"Wang K, Liu Z, Lin Y, Lin J, Han S (2019) Haq: Hardware-aware automated quantization with mixed precision. In: IEEE conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2019.00881"},{"key":"9452_CR118","doi-asserted-by":"crossref","unstructured":"Wang T, Li J, Wang R, Kara D, Liu S, Wertheimer D, Viros\u00a0i Martin A, Ganti R, Srivatsa M, Abdelzaher T (2023) Sudokusens: Enhancing deep learning robustness for iot sensing applications using a generative approach. In: Proceedings of the 21st ACM conference on embedded networked sensor systems, pp 15\u201327","DOI":"10.1145\/3625687.3625785"},{"key":"9452_CR119","unstructured":"Wang T, Li J, Yang Q, Wang R, Chen Y, Sun D, Li B, Hu Y, Kimura T, Kara D, Abdelzaher T (2025) Dynagen: Conditional diffusion models for enhancing acoustic and seismic-based vehicle detection. In: In Proc. IEEE conference on computer communications (Infocom)"},{"key":"9452_CR120","unstructured":"Wang X, Luo Y, Crankshaw D, Tumanov A, Yu F, Gonzalez JE (2018) IDK cascades: Fast deep learning by learning not to overthink. In: Proceedings of the Thirty-fourth conference on uncertainty in artificial intelligence, UAI 2018, Monterey, California, USA, August 6-10, A.\u00a0Globerson and R.\u00a0Silva, Eds. AUAI Press, pp 580\u2013590, 2018"},{"key":"9452_CR121","unstructured":"Wang T, Yang Q, Wang R, Sun D, Li J, Chen Y, Hu Y, Yang C, Kimura T, Kara D et\u00a0al. (2024) Fine-grained control of generative data augmentation in iot sensing . In: Advances in neural information processing systems, 37, pp 32\u00a0787\u201332\u00a0812"},{"key":"9452_CR122","doi-asserted-by":"crossref","unstructured":"Wei Y, Wang Z, Lu Y, Xu C, Liu C, Zhao H, Chen S, Wang Y (2024) Editable scene simulation for autonomous driving via collaborative llm-agents . In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 15\u00a0077\u201315\u00a0087","DOI":"10.1109\/CVPR52733.2024.01428"},{"key":"9452_CR123","doi-asserted-by":"crossref","unstructured":"Wen Q, Sun L, Yang F, Song X, Gao J, Wang X, Xu H (2020) Time series data augmentation for deep learning: a survey, arXiv preprint arXiv:2002.12478","DOI":"10.24963\/ijcai.2021\/631"},{"key":"9452_CR124","unstructured":"Wu H, Judd P, Zhang X, Isaev M, Micikevicius P (2020) Integer quantization for deep learning inference: Principles and empirical evaluation, arXiv preprint arXiv:2004.09602"},{"key":"9452_CR125","unstructured":"Wu B, Wang Y, Zhang P, Tian Y, Vajda P, Keutzer K (2018) Mixed precision quantization of convnets via differentiable neural architecture search, arXiv preprint arXiv:1812.00090"},{"key":"9452_CR126","unstructured":"Wu B, Zhu R, Zhang Z, Sun P, Liu X, Jin X (2024) dLoRA: Dynamically orchestrating requests and adapters for LoRA LLM serving . In: 18th USENIX symposium on operating systems design and implementation (OSDI 24). Santa Clara, CA: USENIX Association, pp. 911\u2013927. [Online]. Available: https:\/\/www.usenix.org\/conference\/osdi24\/presentation\/wu-bingyang"},{"key":"9452_CR127","unstructured":"Xiao G, Lin J, Seznec M, Wu H, Demouth J, Han S (2023) Smoothquant: Accurate and efficient post-training quantization for large language models . In: International conference on machine learning. PMLR, pp 38\u00a0087\u201338\u00a0099"},{"key":"9452_CR128","unstructured":"Xiao G, Tian Y, Chen B, Han S, Lewis M (2023) Efficient streaming language models with attention sinks . In: The twelfth international conference on learning representations"},{"key":"9452_CR129","doi-asserted-by":"crossref","unstructured":"Xu K, Feng Q, Zhang X, Wang D (2022) Multiquant: Training once for multi-bit quantization of neural networks . In: Proceedings of the thirty-first international joint conference on artificial intelligence, ser. IJCAI-2022. International joint conferences on artificial intelligence organization. Available: http:\/\/dx.doi.org\/10.24963\/ijcai.2022\/504","DOI":"10.24963\/ijcai.2022\/504"},{"key":"9452_CR130","doi-asserted-by":"crossref","unstructured":"Xu K, Han L, Tian Y, Yang S, Zhang X (2023) Eq-net: Elastic quantization neural networks . In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 1505\u20131514","DOI":"10.1109\/ICCV51070.2023.00145"},{"key":"9452_CR131","unstructured":"Xu X, Li M, Tao C, Shen T, Cheng R, Li J, Xu C, Tao D, Zhou T (2024) A survey on knowledge distillation of large language models, Available: http:\/\/arxiv.org\/abs\/2402.13116"},{"key":"9452_CR132","doi-asserted-by":"crossref","unstructured":"Yang N, Jang Y, Lee H, Jung S, Jung K (2022) Task-specific compression for multi-task language models using attribution-based pruning, arXiv preprint arXiv:2205.04157","DOI":"10.18653\/v1\/2023.findings-eacl.43"},{"key":"9452_CR133","doi-asserted-by":"crossref","unstructured":"Yao S, Hao Y, Zhao Y, Shao H, Liu D, Liu S, Wang T, Li J, Abdelzaher T (2020) Scheduling real-time deep learning services as imprecise computations . In: 2020 IEEE 26th international conference on embedded and real-time computing systems and applications (RTCSA), pp 1\u201310","DOI":"10.1109\/RTCSA50079.2020.9203676"},{"key":"9452_CR134","doi-asserted-by":"crossref","unstructured":"Yao S, Hu S, Zhao Y, Zhang A, Abdelzaher T (2017) Deepsense: a unified deep learning framework for time-series mobile sensing data processing, ser. WWW \u201917. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, pp 351\u2013360","DOI":"10.1145\/3038912.3052577"},{"key":"9452_CR135","doi-asserted-by":"crossref","unstructured":"Yao S, Li J, Liu D, Wang T, Liu S, Shao H, Abdelzaher T (2020) Deep compressive offloading: speeding up neural network inference by trading edge computation for network latency . In: Proceedings of the 18th conference on embedded networked sensor systems, pp 476\u2013488","DOI":"10.1145\/3384419.3430898"},{"key":"9452_CR136","doi-asserted-by":"crossref","unstructured":"Yao S, Piao A, Jiang W, Zhao Y, Shao H, Liu S, Liu D, Li J, Wang T, Hu S et\u00a0al. (2019) Stfnets: Learning sensing signals from the time-frequency perspective with short-time fourier neural networks . In: The world wide web conference, pp 2192\u20132202","DOI":"10.1145\/3308558.3313426"},{"key":"9452_CR137","doi-asserted-by":"crossref","unstructured":"Yao S, Zhao Y, Zhang A, Su L, Abdelzaher T (2017) Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework . In: Proceedings of the 15th ACM conference on embedded network sensor systems, ser. SenSys \u201917. New York, NY, USA: Association for Computing Machinery","DOI":"10.1145\/3131672.3131675"},{"key":"9452_CR138","unstructured":"Youssef A (2020) Why the Kubernetes scheduler is not enough for your AI workloads, Available: https:\/\/www.cncf.io\/blog\/2020\/08\/10\/why-the-kubernetes-scheduler-is-not-enough-for-your-ai-workloads\/"},{"key":"9452_CR139","unstructured":"Yu G-I, Jeong J\u00a0S, Kim G-W, Kim S, Chun B-G (2022) Orca: A distributed serving system for $$\\{$$Transformer-Based$$\\}$$ generative models. In: 16th USENIX symposium on operating systems design and implementation (OSDI 22), pp 521\u2013538"},{"key":"9452_CR140","first-page":"3304","volume":"37","author":"T Zhang","year":"2024","unstructured":"Zhang T, Yi J, Xu Z, Shrivastava A (2024) Kv cache is 1 bit per channel: efficient large language model inference with coupled quantization. Adv Neural Inf Process Syst 37:3304\u20133331","journal-title":"Adv Neural Inf Process Syst"},{"key":"9452_CR141","unstructured":"Zhang Y, Du Y, Luo G, Zhong Y, Zhang Z, Liu S, Ji R (2024) Cam: Cache merging for memory-efficient llms inference. In: Forty-first international conference on machine learning"},{"key":"9452_CR142","doi-asserted-by":"crossref","unstructured":"Zhang N, Liu Y, Zhao X, Cheng W, Bao R, Zhang R, Mitra P, Chen H (2024) Pruning as a domain-specific llm extractor, arXiv preprint arXiv:2405.06275","DOI":"10.18653\/v1\/2024.findings-naacl.91"},{"key":"9452_CR143","doi-asserted-by":"crossref","unstructured":"Zhao Y, Gu A, Varma R, Luo L, Huang CC, Xu M, Wright L, Shojanazeri H, Ott M, Shleifer S, Desmaison A, Balioglu C, Damania P, Nguyen B, Chauhan G, Hao Y, Mathews A, Li S (2023) PyTorch FSDP: Experiences on scaling fully sharded data parallel, Available: https:\/\/arxiv.org\/abs\/2304.11277","DOI":"10.14778\/3611540.3611569"},{"key":"9452_CR144","unstructured":"Zhao Y, Sharif H, Pao-Huang P, Shah VN, Sivakumar AN, Gasparino M\u00a0V, Mahmoud A, Zhao N, Adve S, Chowdhary G, Misailovic S, Adve V (2023) ApproxCaliper: A programmable framework for application-aware neural network optimization. In: Proc. of machine learning and systems (MLSys)"},{"key":"9452_CR145","unstructured":"Zheng L, Yin L, Xie Z, Sun CL, Huang J, Yu CH, Cao S, Kozyrakis C, Stoica I, Gonzalez JE et\u00a0al. (2024) Sglang: Efficient execution of structured language model programs. In: Advances in neural information processing systems, 37, pp 62\u00a0557\u201362\u00a0583, 2024"},{"key":"9452_CR146","unstructured":"Zhong Y, Liu S, Chen J, Hu J, Zhu Y, Liu X, Jin X, Zhang H (2024) $$\\{$$DistServe$$\\}$$: Disaggregating prefill and decoding for goodput-optimized large language model serving. In: 18th USENIX symposium on operating systems design and implementation (OSDI 24), pp 193\u2013210"},{"key":"9452_CR147","unstructured":"Zhou S, Wu Y, Ni Z, Zhou X, Wen H, Zou Y (2016) Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv preprint arXiv:1606.06160"}],"container-title":["Real-Time Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11241-025-09452-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11241-025-09452-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11241-025-09452-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T01:13:57Z","timestamp":1757207637000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11241-025-09452-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6]]},"references-count":147,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["9452"],"URL":"https:\/\/doi.org\/10.1007\/s11241-025-09452-w","relation":{},"ISSN":["0922-6443","1573-1383"],"issn-type":[{"value":"0922-6443","type":"print"},{"value":"1573-1383","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6]]},"assertion":[{"value":"6 June 2025","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 July 2025","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}