{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T00:36:28Z","timestamp":1760056588284,"version":"build-2065373602"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/100000028","name":"Semiconductor Research Corporation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100000028","id-type":"DOI","asserted-by":"crossref"}]},{"name":"India Research Program","award":["2020-IR-2979"],"award-info":[{"award-number":["2020-IR-2979"]}]},{"name":"R Systems Center of Excellence on Sustainable Artificial Intelligence"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>Integrating domain-specific hardware accelerators on modern systems on chips (SoCs) has enabled complex applications, such as vision, natural language processing, autonomous driving, and augmented reality, on small form factors. This leads to challenges in the integration of accelerators, with high memory bandwidth requirements and strict deadlines, on the system\u2019s memory hierarchy. The system-level shared cache, or last-level cache (LLC), is a critical resource shared by multi-core processors, GPUs, and hardware accelerators in modern heterogeneous SoCs. It significantly reduces the bottleneck at the off-chip memory and delivers high performance. With the integration of accelerators on the LLC gaining momentum, the on-chip shared cache management becomes vital. If not managed intelligently, the interference between cache requests from the cores and the accelerators can significantly deteriorate their performance.<\/jats:p>\n          <jats:p>\n            Given the architectural differences between DRAM and cache systems, the off-chip memory management strategies explored by previous works cannot be extended to the LLC. We propose a deadline-aware flexible LLC arbitration and scheduling framework,\n            <jats:italic toggle=\"yes\">FLASH<\/jats:italic>\n            , to dynamically partition the LLC bandwidth between the accelerators and multi-core processors to meet the deadline given for the accelerator while minimizing the impact on the performance of the cores. FLASH arbitrates between the requests from the cores and the accelerators and schedules the requests depending on the accelerator\u2019s progress and its chances of meeting the deadline. We evaluate FLASH across different workloads and hardware accelerator configurations to show that it not only achieves significantly better performance for the cores than other static scheduling policies but also significantly reduces the deadline miss rates of the accelerator.\n          <\/jats:p>","DOI":"10.1145\/3757742","type":"journal-article","created":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:17:32Z","timestamp":1753874252000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["FLASH: Deadline-Aware Flexible LLC Arbitration and Scheduling for Hardware Accelerators"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7353-831X","authenticated-orcid":false,"given":"Ayushi","family":"Agarwal","sequence":"first","affiliation":[{"name":"Amar Nath and Shashi Khosla School of Information Technology, Indian Institute of Technology Delhi","place":["New Delhi, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4399-1007","authenticated-orcid":false,"given":"Pulkit","family":"Goel","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Delhi","place":["New Delhi, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6198-2581","authenticated-orcid":false,"given":"P.J.","family":"Joseph","sequence":"additional","affiliation":[{"name":"NXP Semiconductors India","place":["Noida, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9939-2774","authenticated-orcid":false,"given":"Prokash","family":"Ghosh","sequence":"additional","affiliation":[{"name":"IEEE","place":["Noida, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8703-0437","authenticated-orcid":false,"given":"Sourav","family":"Roy","sequence":"additional","affiliation":[{"name":"NXP Semiconductors India","place":["Noida, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2508-7531","authenticated-orcid":false,"given":"Preeti Ranjan","family":"Panda","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, Indian Institute of Technology Delhi","place":["New Delhi, India"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,9]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2996616"},{"key":"e_1_3_2_3_2","first-page":"173","volume-title":"Proceedings of the 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research)","year":"2016","unstructured":"Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, Jie Chen, Jingdong Chen, Zhijie Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Ke Ding, Niandong Du, Erich Elsen, Jesse Engel, Weiwei Fang, Linxi Fan, Christopher Fougner, Liang Gao, Caixia Gong, Awni Hannun, Tony Han, Lappi Johannes, Bing Jiang, Cai Ju, Billy Jun, Patrick LeGresley, Libby Lin, Junjie Liu, Yang Liu, Weigao Li, Xiangang Li, Dongpeng Ma, Sharan Narang, Andrew Ng, Sherjil Ozair, Yiping Peng, Ryan Prenger, Sheng Qian, Zongfeng Quan, Jonathan Raiman, Vinay Rao, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Kavya Srinet, Anuroop Sriram, Haiyuan Tang, Liliang Tang, Chong Wang, Jidong Wang, Kaifu Wang, Yi Wang, Zhijian Wang, Zhiqian Wang, Shuang Wu, Likai Wei, Bo Xiao, Wen Xie, Yan Xie, Dani Yogatama, Bin Yuan, Jun Zhan, and Zhenyao Zhu. 2016. Deep speech 2 : End-to-end speech recognition in English and Mandarin. In Proceedings of the 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research). PMLR, New York, New York, USA, 173\u2013182. Retrieved from https:\/\/proceedings.mlr.press\/v48\/amodei16.html"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3085572"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/2024716.2024718"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/2996864"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2016.2616357"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3532213.3532250"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2620977"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/HCS52781.2021.9567521"},{"key":"e_1_3_2_11_2","unstructured":"Google. 2019. Edge TPU. Retrieved February 2025 from https:\/\/cloud.google.com\/edge-tpu"},{"key":"e_1_3_2_12_2","volume-title":"Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.)","author":"Hennessy John L.","year":"2011","unstructured":"John L. Hennessy and David A. Patterson. 2011. Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/1186736.1186737"},{"key":"e_1_3_2_14_2","unstructured":"HiSilicon. 2020. Kirin 9000. (2020). Retrieved February 2025 from https:\/\/www.hisilicon.com\/en\/products\/Kirin\/Kirin-flagship-chips\/Kirin-9000"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT52795.2021.00023"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. 10.48550\/arXiv.1704.04861","DOI":"10.48550\/arXiv.1704.04861"},{"key":"e_1_3_2_17_2","unstructured":"Intel\u00ae. 2015. White Paper | Improving Real-Time Performance by Utilizing Cache Allocation Technology | Enhancing Performance via Allocation of the Processor\u2019s Cache. Retrieved February 2025 from https:\/\/www.intel.com\/content\/dam\/www\/public\/us\/en\/documents\/white-papers\/cache-allocation-technology-white-paper.pdf"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/1816038.1815971"},{"key":"e_1_3_2_19_2","unstructured":"Nathan Goulding-Hotta Artem Vasilyev Jason Redgrave Albert Meixner and Ofer Shacham. 2018. Pixel visual core: Google\u2019s fully programmable image vision and AI processor for mobile devices. (2018). Retrieved February 2025 from https:\/\/old.hotchips.org\/hc30\/1conf\/1.02_Google_HC30.Google.JasonRedgrave.V01.pdf"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228513"},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","unstructured":"Norman P. Jouppi Cliff Young Nishant Patil David Patterson Gaurav Agrawal Raminder Bajwa Sarah Bates Suresh Bhatia Nan Boden Al Borchers Rick Boyle Pierre-luc Cantin Clifford Chao Chris Clark Jeremy Coriell Mike Daley Matt Dau Jeffrey Dean Ben Gelb Tara Vazir Ghaemmaghami Rajendra Gottipati William Gulland Robert Hagmann C. Richard Ho Doug Hogberg John Hu Robert Hundt Dan Hurt Julian Ibarz Aaron Jaffey Alek Jaworski Alexander Kaplan Harshit Khaitan Daniel Killebrew Andy Koch Naveen Kumar Steve Lacy James Laudon James Law Diemthu Le Chris Leary Zhuyuan Liu Kyle Lucke Alan Lundin Gordon MacKean Adriana Maggiore Maire Mahony Kieran Miller Rahul Nagarajan Ravi Narayanaswami Ray Ni Kathy Nix Thomas Norrie Mark Omernick Narayana Penukonda Andy Phelps Jonathan Ross Matt Ross Amir Salek Emad Samadiani Chris Severn Gregory Sizikov Matthew Snelham Jed Souter Dan Steinberg Andy Swing Mercedes Tan Gregory Thorson Bo Tian Horia Toma Erick Tuttle Vijay Vasudevan Richard Walter Walter Wang Eric Wilcox and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. Retrieved February 2025 from https:\/\/arxiv.org\/pdf\/1704.04760.pdf","DOI":"10.1145\/3140659.3080246"},{"key":"e_1_3_2_22_2","unstructured":"Habana Labs. 2019. GoyaTM Inference Platform White Paper. (2019). Retrieved February 2025 from https:\/\/habana.ai\/wp-content\/uploads\/pdf\/habana_labs_goya_whitepaper.pdf"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3661997"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2012.6168947"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472456.3472461"},{"key":"e_1_3_2_26_2","unstructured":"NVIDIA. 2017. NVDLA: The NVIDIA Deep Learning Accelerator. (2017). Retrieved February 2025 from https:\/\/nvdla.org\/hw\/v1\/hwarch.html"},{"key":"e_1_3_2_27_2","unstructured":"NXP Semiconductors. 2024. i.MX 95 applications processor family: High-performance safety enabled platform with eIQ neutron NPU. (2024). Retrieved February 2025 from https:\/\/www.nxp.com\/products\/i.MX95"},{"key":"e_1_3_2_28_2","unstructured":"NXP Semiconductors. 2024. Layerscape 2088A and 2048A Processors. (2024). Retrieved February 2025 from https:\/\/www.nxp.com\/products\/LS2088A"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2019.00042"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2020.2968066"},{"key":"e_1_3_2_31_2","unstructured":"Qualcomm. 2020. Snapdragon 888 5G Mobile Platform. (2020). Retrieved February 2025 from www.qualcomm.com\/products\/snapdragon-888-5g-mobile-platform"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.49"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2017.37"},{"key":"e_1_3_2_34_2","unstructured":"Joseph Redmon Santosh Kumar Divvala Ross B. Girshick and Ali Farhadi. 2015. You only look once: Unified real-time object detection. CoRR abs\/1506.02640 (2015). arXiv:1506.02640 http:\/\/arxiv.org\/abs\/1506.02640"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339668"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00047"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS48437.2020.00016"},{"key":"e_1_3_2_39_2","unstructured":"Samsung. 2020. Samsung Mobile processor Exynos 990. (2020). Retrieved February 2025 from https:\/\/semiconductor.samsung.com\/processor\/mobile-processor\/exynos-990\/"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2014.6853196"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783751"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"David Silver Julian Schrittwieser Karen Simonyan Ioannis Antonoglou Aja Huang Arthur Guez Thomas Hubert Lucas baker Matthew Lai Adrian Bolton Yutian Chen Timothy P. Lillicrap Fan Hui L. Sifre George van den Driessche Thore Graepel and Demis Hassabis. 2017. Mastering the game of Go without human knowledge. Nature 550 (2017) 354\u2013359. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:205261034","DOI":"10.1038\/nature24270"},{"key":"e_1_3_2_43_2","unstructured":"SiMaai. 2024. Machine Learning System on Chip (MLSoC). (2024). Retrieved February 2025 from https:\/\/sima.ai\/wp-content\/uploads\/2024\/06\/SiMa_MLSoC_ProductBrief_5.20.24.pdf"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2024.3394479"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3362100"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/2847255"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3202663"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2023.3329443"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASICON58565.2023.10396658"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC.2019.8916239"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD45719.2019.8942149"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3424669"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","unstructured":"Yaocheng Xiang Chencheng Ye Xiaolin Wang Yingwei Luo and Zhenlin Wang. 2019. EMBA: Efficient memory bandwidth allocation to improve performance on intel commodity processor. In Proceedings of the 48th International Conference on Parallel Processing (ICPP\u201919). Association for Computing Machinery 1\u201312. 10.1145\/3337821.3337863","DOI":"10.1145\/3337821.3337863"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3757742","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T12:40:54Z","timestamp":1760013654000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3757742"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,9]]},"references-count":53,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3757742"],"URL":"https:\/\/doi.org\/10.1145\/3757742","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2025,10,9]]},"assertion":[{"value":"2025-03-27","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-20","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}