{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T19:49:07Z","timestamp":1774554547581,"version":"3.50.1"},"reference-count":81,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"name":"NSF","award":["CCF-2413870"],"award-info":[{"award-number":["CCF-2413870"]}]},{"name":"NSERC","award":["RGPIN-2023-03478"],"award-info":[{"award-number":["RGPIN-2023-03478"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Meas. Anal. Comput. Syst."],"published-print":{"date-parts":[[2026,3,26]]},"abstract":"<jats:p>As large language models (LLMs) become widely used, their environmental impact \u2014 especially carbon emission \u2014 has attracted more attention. Prior studies focus on compute-related carbon emissions. In this paper, we find that storage is another key contributor. LLM caching, which saves and reuses KV caches for repeated context, reduces operational carbon by avoiding redundant computation. However, this benefit comes at the cost of embodied carbon from high-capacity, high-speed SSDs. As LLMs scale, the embodied carbon of storage grows significantly. To address this tradeoff, we present GreenCache, a carbon-aware cache management framework that dynamically derives resource allocation plans for LLM serving. GreenCache analyzes the correlation between carbon emission and SLO satisfaction, reconfiguring the resource over time to keep the balance between SLO and carbon emission under dynamic workloads. Evaluations from real traces demonstrate that GreenCache achieves an average carbon reduction of 15.1% when serving Llama-3 70B in the FR grid, with reductions reaching up to 25.3%, while staying within latency constraints for &gt; 90% of requests.<\/jats:p>","DOI":"10.1145\/3788087","type":"journal-article","created":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T18:49:47Z","timestamp":1774550987000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Cache Your Prompt When It's Green \u2014 Carbon-Aware Caching for Large Language Model Serving"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-5438-1795","authenticated-orcid":false,"given":"Yuyang","family":"Tian","sequence":"first","affiliation":[{"name":"University of Waterloo, Waterloo, ON, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8630-7959","authenticated-orcid":false,"given":"Desen","family":"Sun","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, ON, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2757-9182","authenticated-orcid":false,"given":"Yi","family":"Ding","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, IN, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9706-6177","authenticated-orcid":false,"given":"Sihang","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Waterloo, Waterloo, ON, Canada"}]}],"member":"320","published-online":{"date-parts":[[2026,3,26]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575754"},{"key":"e_1_2_1_2_1","first-page":"117","volume-title":"Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI","author":"Agrawal Amey","year":"2024","unstructured":"Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, and Ramachandran Ramjee. 2024. Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2024). USENIX Association, Santa Clara, CA, USA, 117-134. https:\/\/www.usenix.org\/conference\/osdi24\/presentation\/agrawal"},{"key":"e_1_2_1_3_1","volume-title":"Azure LLM inference trace","year":"2024","unstructured":"Azure. 2024. Azure LLM inference trace 2024. https:\/\/github.com\/Azure\/AzurePublicDataset\/blob\/master\/AzureLLMInferenceDataset2024.md."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.nlposs-1.24"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3698038.3698542"},{"key":"e_1_2_1_6_1","volume-title":"Ganger","author":"Berg Benjamin","year":"2020","unstructured":"Benjamin Berg, Daniel S. Berger, Sara McAllister, Isaac Grosof, Sathya Gunasekar, Jimmy Lu, Michael Uhlar, Jim Carrig, Nathan Beckmann, Mor Harchol-Balter, and Gregory R. Ganger. 2020. The CacheLib Caching Engine: Design and Experiences at Scale. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Virtual, 753-768. https:\/\/www.usenix.org\/conference\/osdi20\/presentation\/berg"},{"key":"e_1_2_1_7_1","first-page":"1","volume-title":"Understanding the Implications of Uncertainty in Embodied Carbon Models for Sustainable Computing. In Workshop on Sustainable Computer Systems (HotCarbon). ACM","author":"Bhagavathula Anvita","year":"2024","unstructured":"Anvita Bhagavathula, Leo Han, and Udit Gupta. 2024. Understanding the Implications of Uncertainty in Embodied Carbon Models for Sustainable Computing. In Workshop on Sustainable Computer Systems (HotCarbon). ACM, New York, NY, USA, 1-7."},{"key":"e_1_2_1_8_1","unstructured":"Hendrik Borghorst. 2018. rapl-read-ryzen. https:\/\/github.com\/djselbeck\/rapl-read-ryzen."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.1999.749260"},{"key":"e_1_2_1_10_1","volume-title":"The Thirteenth International Conference on Learning Representations (ICLR). OpenReview.net","author":"Chen Zhiliang","year":"2025","unstructured":"Zhiliang Chen, Xinyuan Niu, Chuan-Sheng Foo, and Bryan Kian Hsiang Low. 2025. Broaden your SCOPE! Efficient Multi-turn Conversation Planning for LLMs with Semantic Space. In The Thirteenth International Conference on Learning Representations (ICLR). OpenReview.net, Singapore. https:\/\/openreview.net\/forum?id=3cgMU3TyyE"},{"key":"e_1_2_1_11_1","volume-title":"Do Large Language Models Need a Content Delivery Network? arXiv preprint arXiv:2409.13761","author":"Cheng Yihua","year":"2024","unstructured":"Yihua Cheng, Kuntai Du, Jiayi Yao, and Junchen Jiang. 2024. Do Large Language Models Need a Content Delivery Network? arXiv preprint arXiv:2409.13761 (2024)."},{"key":"e_1_2_1_12_1","unstructured":"COIN-OR Foundation. 2005-. CBC (Coin-or branch and cut) solver. https:\/\/github.com\/coin-or\/Cbc. Open-source MILP solver from the COIN-OR project."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.70"},{"key":"e_1_2_1_14_1","unstructured":"DeepSeek. 2025. DeepSeek. https:\/\/chat.deepseek.com\/."},{"key":"e_1_2_1_15_1","unstructured":"Dell Technologies. 2019. Life Cycle Assessment of Dell R740. https:\/\/www.delltechnologies.com\/asset\/en-us\/products\/servers\/technical-support\/Full_LCA_Dell_R740.pdf."},{"key":"e_1_2_1_16_1","first-page":"37","volume-title":"2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC). IEEE, IEEE","author":"Ding Yi","year":"2024","unstructured":"Yi Ding and Tianyao Shi. 2024. Sustainable LLM Serving: Environmental Implications, Challenges, and Opportunities. In 2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC). IEEE, IEEE, Austin, TX, USA, 37-38."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i16.29748"},{"key":"e_1_2_1_18_1","unstructured":"Electricity Maps. 2025. Electricity Maps. https:\/\/www.electricitymap.org\/map\/."},{"key":"e_1_2_1_19_1","volume-title":"LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models. In The Twelfth International Conference on Learning Representations (ICLR). OpenReview.net","author":"Faiz Ahmad","year":"2024","unstructured":"Ahmad Faiz, Sotaro Kaneda, Ruhan Wang, Rita Chukwunyere Osi, Prateek Sharma, Fan Chen, and Lei Jiang. 2024. LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models. In The Twelfth International Conference on Learning Representations (ICLR). OpenReview.net, Vienna, Austria. https:\/\/openreview.net\/forum?id=aIok3ZD9to"},{"key":"e_1_2_1_20_1","first-page":"111","volume-title":"Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention. In USENIX Annual Technical Conference (ATC). USENIX Association","author":"Gao Bin","year":"2024","unstructured":"Bin Gao, Zhuomin He, Puru Sharma, Qingxuan Kang, Djordje Jevdjic, Junbo Deng, Xingkun Yang, Zhou Yu, and Pengfei Zuo. 2024. Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention. In USENIX Annual Technical Conference (ATC). USENIX Association, Santa Clara, CA, 111-126. https:\/\/www.usenix.org\/conference\/atc24\/presentation\/gao-bin-cost"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3689031.3696072"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1298306.1298310"},{"key":"e_1_2_1_23_1","unstructured":"GitHub. 2024. copilot. https:\/\/github.com\/features\/copilot."},{"key":"e_1_2_1_24_1","unstructured":"Google. 2024. Gemini. https:\/\/gemini.google.com\/app."},{"key":"e_1_2_1_25_1","unstructured":"Sarah Griffiths. 2020. Why your internet habits are not as clean as you think. https:\/\/www.bbc.com\/future\/article\/20200305-why-your-internet-habits-are-not-as-clean-as-you-think."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527408"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3695053.3731023"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2014.6847969"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522722"},{"key":"e_1_2_1_30_1","unstructured":"Hugging Face. 2023. ShareGPT_Vicuna_unfiltered."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3676641.3716245"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1147"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI61997.2024.00096"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2018436.2018502"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607035"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607034"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 3rd Workshop on Sustainable Computer Systems (HotCarbon). ACM","author":"Li Yueying","year":"2024","unstructured":"Yueying Li, Omer Graif, and Udit Gupta. 2024. Towards Carbon-efficient LLM Life Cycle. In Proceedings of the 3rd Workshop on Sustainable Computer Systems (HotCarbon). ACM, New York, NY, USA."},{"key":"e_1_2_1_39_1","volume-title":"Ecoserve: Designing carbon-aware ai inference systems. arXiv preprint arXiv:2502.05043","author":"Li Yueying","year":"2025","unstructured":"Yueying Li, Zhanqiu Hu, Esha Choukse, Rodrigo Fonseca, G Edward Suh, and Udit Gupta. 2025. Ecoserve: Designing carbon-aware ai inference systems. arXiv preprint arXiv:2502.05043 (2025)."},{"key":"e_1_2_1_40_1","first-page":"663","volume-title":"AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association","author":"Li Zhuohan","year":"2023","unstructured":"Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023c. AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Boston, MA, 663-679. https:\/\/www.usenix.org\/conference\/osdi23\/presentation\/li-zhouhan"},{"key":"e_1_2_1_41_1","volume-title":"Advances in Neural Information Processing Systems","author":"Liu Shuo","year":"2024","unstructured":"Shuo Liu, Kaining Ying, Hao Zhang, Yue Yang, Yuqi Lin, Tianle Zhang, Chuanhao Li, Yu Qiao, Ping Luo, Wenqi Shao, and Kaipeng Zhang. 2024b. ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models. In Advances in Neural Information Processing Systems, A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (Eds.), Vol. 37. Curran Associates, Inc., Vancouver, BC, Canada, 100734-100782. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2024\/file\/b69396afc07a9ca3428d194f4db84c02-Paper-Datasets_and_Benchmarks_Track.pdf"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3651890.3672274"},{"key":"e_1_2_1_43_1","unstructured":"LLMPerf. 2024. LLMPerf Leaderboard."},{"key":"e_1_2_1_44_1","unstructured":"LMCache Team. 2025a. KV Cache Size Calculator. https:\/\/lmcache.ai\/kv_cache_calculator.html."},{"key":"e_1_2_1_45_1","unstructured":"LMCache Team. 2025b. LMCache. https:\/\/lmcache.ai\/."},{"key":"e_1_2_1_46_1","first-page":"15630","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE","author":"Luo Chuwei","year":"2024","unstructured":"Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, and Cong Yao. 2024. LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 15630-15640."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3604930.3605717"},{"key":"e_1_2_1_48_1","first-page":"287","volume-title":"Hyrax: Fail-in-Place Server Operation in Cloud Platforms. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association","author":"Lyu Jialun","unstructured":"Jialun Lyu, Marisa You, Celine Irvene, Mark Jung, Tyler Narmore, Jacob Shapiro, Luke Marshall, Savyasachi Samal, Ioannis Manousakis, Lisa Hsu, Preetha Subbarayalu, Ashish Raniwala, Brijesh Warrier, Ricardo Bianchini, Bianca Schroeder, and Daniel S. Berger. 2023b. Hyrax: Fail-in-Place Server Operation in Cloud Platforms. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Boston, MA, USA, 287-304. https:\/\/www.usenix.org\/conference\/osdi23\/presentation\/lyu"},{"key":"e_1_2_1_49_1","first-page":"19","volume-title":"Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (BuildSys). ACM","author":"Maji Diptyaroop","year":"2023","unstructured":"Diptyaroop Maji, Prashant Shenoy, and Ramesh K Sitaraman. 2023. Multi-Day Forecasting of Electric Grid Carbon Intensity Using Machine Learning. In Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (BuildSys). ACM, New York, NY, USA, 19-33."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3538637.3538849"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3727200.3727211"},{"key":"e_1_2_1_52_1","first-page":"745","volume-title":"FairyWREN: A Sustainable Cache for Emerging Write-Read-Erase Flash Interfaces. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association","author":"McAllister Sara","unstructured":"Sara McAllister, Yucong ''Sherry'' Wang, Benjamin Berg, Daniel S. Berger, George Amvrosiadis, Nathan Beckmann, and Gregory R. Ganger. 2024b. FairyWREN: A Sustainable Cache for Emerging Write-Read-Erase Flash Interfaces. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Santa Clara, CA, 745-764. https:\/\/www.usenix.org\/conference\/osdi24\/presentation\/mcallister"},{"key":"e_1_2_1_53_1","unstructured":"Meta. 2024. Introducing Meta Llama 3: The most capable openly available LLM to date. https:\/\/ai.meta.com\/blog\/meta-llama-3\/."},{"key":"e_1_2_1_54_1","unstructured":"Micron. 2025. DDR4 SDRAM memory. https:\/\/www.micron.com\/products\/memory\/dram-components\/ddr4-sdram."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the 3rd Workshop on Sustainable Computer Systems (HotCarbon). ACM","author":"Nguyen Sophia","year":"2024","unstructured":"Sophia Nguyen, Beihao Zhou, Yi Ding, and Sihang Liu. 2024. Towards Sustainable Large Language Model Serving. In Proceedings of the 3rd Workshop on Sustainable Computer Systems (HotCarbon). ACM, New York, NY, USA."},{"key":"e_1_2_1_56_1","unstructured":"OpenAI. 2023. ChatGPT. https:\/\/chatgpt.com\/."},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE\/ACM","author":"Ostrouchov George","unstructured":"George Ostrouchov, Don Maxwell, Rizwan A. Ashraf, Christian Engelmann, Mallikarjun Shankar, and James H. Rogers. 2020. GPU lifetimes on titan supercomputer: Survival analysis and reliability. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE\/ACM, Atlanta, Georgia, USA, 41."},{"key":"e_1_2_1_58_1","volume-title":"International Symposium on Computer Architecture (ISCA). IEEE Press","author":"Patel Pratyush","year":"2024","unstructured":"Pratyush Patel, Esha Choukse, Chaojie Zhang, \u00cd\u00f1igo Goiri, Aashaka Shah, Saeed Maleki, and Ricardo Bianchini. 2024. Splitwise improves GPU usage by splitting LLM inference phases. In International Symposium on Computer Architecture (ISCA). IEEE Press, Buenos Aires, Argentina, 118\u2013132."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-023-00958-w"},{"key":"e_1_2_1_60_1","unstructured":"PuLP developers. 2025. PuLP: A Python Linear Programming API."},{"key":"e_1_2_1_61_1","unstructured":"pyNVML Developers. 2025. pyNVML. https:\/\/pypi.org\/project\/nvidia-ml-py\/."},{"key":"e_1_2_1_62_1","volume-title":"Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving. arXiv:2407.00079 [cs.DC] https:\/\/arxiv.org\/abs\/2407.00079","author":"Qin Ruoyu","year":"2024","unstructured":"Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, and Xinran Xu. 2024. Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving. arXiv:2407.00079 [cs.DC] https:\/\/arxiv.org\/abs\/2407.00079"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC58863.2023.10363447"},{"key":"e_1_2_1_64_1","unstructured":"Samsung. 2023. Samsung V-NAND SSD 990 PRO. https:\/\/download.semiconductor.samsung.com\/resources\/data-sheet\/samsung_nvme_ssd_990_pro_datasheet_rev.2.0.pdf."},{"key":"e_1_2_1_65_1","unstructured":"Seagate. 2025. The Decarbonizing Data Report. https:\/\/www.seagate.com\/ca\/en\/resources\/decarbonizing-data-report\/."},{"key":"e_1_2_1_66_1","unstructured":"ShareGPT. 2023. ShareGPT."},{"key":"e_1_2_1_67_1","unstructured":"Tianyao Shi Yanran Wu Sihang Liu and Yi Ding. 2024. GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions. arXiv:2412.20322 [cs.AR] https:\/\/arxiv.org\/abs\/2412.20322"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2025.3630094"},{"key":"e_1_2_1_69_1","volume-title":"Smith et al","author":"Taylor","year":"2017","unstructured":"Taylor G. Smith et al., 2017-. pmdarima: ARIMA estimators for Python. http:\/\/www.alkaline-ml.com\/pmdarima"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA61900.2025.00102"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3630614.3630616"},{"key":"e_1_2_1_72_1","unstructured":"Gemini Team. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv:2403.05530 [cs.CL] https:\/\/arxiv.org\/abs\/2403.05530"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA59077.2024.00041"},{"key":"e_1_2_1_74_1","unstructured":"Vinnie Wong. 2023. Gen AI's Environmental Ledger: A Closer Look at the Carbon Footprint of ChatGPT. https:\/\/piktochart.com\/blog\/carbon-footprint-of-chatgpt\/."},{"key":"e_1_2_1_75_1","volume-title":"Proceedings of Machine Learning and Systems (MLSys)","author":"Wu Carole-Jean","year":"2022","unstructured":"Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga, Jinshi Huang, Charles Bai, et al., 2022. Sustainable AI: Environmental implications, challenges and opportunities. Proceedings of Machine Learning and Systems (MLSys) (2022)."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3679240.3734630"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/3689031.3696098"},{"key":"e_1_2_1_78_1","first-page":"521","volume-title":"Orca: A Distributed Serving System for Transformer-Based Generative Models. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association","author":"Yu Gyeong-In","year":"2022","unstructured":"Gyeong-In Yu, Joo Seong Jeong, Geon-Woo Kim, Soojeong Kim, and Byung-Gon Chun. 2022. Orca: A Distributed Serving System for Transformer-Based Generative Models. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Carlsbad, CA, 521-538. https:\/\/www.usenix.org\/conference\/osdi22\/presentation\/yu"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13677-025-00770-9"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-024-68339-1"},{"key":"e_1_2_1_81_1","first-page":"193","volume-title":"DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association","author":"Zhong Yinmin","year":"2024","unstructured":"Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Santa Clara, CA, 193-210. https:\/\/www.usenix.org\/conference\/osdi24\/presentation\/zhong-yinmin"}],"container-title":["Proceedings of the ACM on Measurement and Analysis of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3788087","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T18:50:42Z","timestamp":1774551042000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3788087"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,26]]},"references-count":81,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,3,26]]}},"alternative-id":["10.1145\/3788087"],"URL":"https:\/\/doi.org\/10.1145\/3788087","relation":{},"ISSN":["2476-1249"],"issn-type":[{"value":"2476-1249","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,26]]},"assertion":[{"value":"2026-03-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}