{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T15:45:14Z","timestamp":1780501514085,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,3,30]],"date-time":"2025-03-30T00:00:00Z","timestamp":1743292800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,3,30]]},"DOI":"10.1145\/3721146.3721953","type":"proceedings-article","created":{"date-parts":[[2025,4,1]],"date-time":"2025-04-01T17:42:05Z","timestamp":1743529325000},"page":"208-215","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-8714-5647","authenticated-orcid":false,"given":"Patrick","family":"Wilhelm","sequence":"first","affiliation":[{"name":"BIFOLD, Berlin, Germany"},{"name":"Technische Universit\u00e4t Berlin, Berlin, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5154-7813","authenticated-orcid":false,"given":"Thorsten","family":"Wittkopp","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Berlin, Berlin, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6454-6799","authenticated-orcid":false,"given":"Odej","family":"Kao","sequence":"additional","affiliation":[{"name":"Technische Universit\u00e4t Berlin, Berlin, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,4]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Edward Beeching Lewis Tunstall and Sasha Rush. [n. d.]. Scaling test-time compute with open models. https:\/\/huggingface.co\/spaces\/HuggingFaceH4\/blogpost-scaling-test-time-compute"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","unstructured":"Benoit Courty Victor Schmidt Sasha Luccioni Goyal-Kamal MarionCoutarel Boris Feld J\u00e9r\u00e9my Lecourt LiamConnell Amine Saboni Inimaz supatomic Mathilde L\u00e9val Luis Blanche Alexis Cruveiller ouminasara Franklin Zhao Aditya Joshi Alexis Bogroff Hugues de Lavoreille Niko Laskaris Edoardo Abati Douglas Blank Ziyao Wang Armin Catovic Marc Alencon Micha\u0142 St\u0119ch\u0142y Christian Bauer Lucas Ot\u00e1vio N. de Ara\u00fajo JPW and MinervaBooks. 2024. mlco2\/codecarbon: v2.4.1. 10.5281\/zenodo.11171501","DOI":"10.5281\/zenodo.11171501"},{"key":"e_1_3_2_1_3_1","volume-title":"Ying Wen, Weinan Zhang, and Jun Wang.","author":"Feng Xidong","year":"2023","unstructured":"Xidong Feng, Ziyu Wan, Muning Wen, Stephen Marcus McAleer, Ying Wen, Weinan Zhang, and Jun Wang. 2023. Alphazero-like tree-search can guide large language model decoding and training. arXiv preprint arXiv:2309.17179 (2023)."},{"key":"e_1_3_2_1_4_1","unstructured":"Daya Guo Dejian Yang Haowei Zhang Junxiao Song Ruoyu Zhang Runxin Xu Qihao Zhu Shirong Ma Peiyi Wang Xiao Bi et al. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 (2025)."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527408"},{"key":"e_1_3_2_1_6_1","volume-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149","author":"Han Song","year":"2015","unstructured":"Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)."},{"key":"e_1_3_2_1_7_1","volume-title":"Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300","author":"Hendrycks Dan","year":"2020","unstructured":"Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2020. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020)."},{"key":"e_1_3_2_1_8_1","volume-title":"Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al.","author":"Hoffmann Jordan","year":"2022","unstructured":"Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. 2022. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556 (2022)."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00286"},{"key":"e_1_3_2_1_10_1","volume-title":"Compressing llms: The truth is rarely pure and never simple. arXiv preprint arXiv:2310.01382","author":"Jaiswal Ajay","year":"2023","unstructured":"Ajay Jaiswal, Zhe Gan, Xianzhi Du, Bowen Zhang, Zhangyang Wang, and Yinfei Yang. 2023. Compressing llms: The truth is rarely pure and never simple. arXiv preprint arXiv:2310.01382 (2023)."},{"key":"e_1_3_2_1_11_1","volume-title":"Scaling laws for neural language models. arXiv preprint arXiv:2001.08361","author":"Kaplan Jared","year":"2020","unstructured":"Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361 (2020)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_3_2_1_13_1","unstructured":"Baolin Li Yankai Jiang Vijay Gadepally and Devesh Tiwari. 2024. Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference. arXiv:2403.12900 [cs.DC] https:\/\/arxiv.org\/abs\/2403.12900"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607034"},{"key":"e_1_3_2_1_15_1","volume-title":"Let's verify step by step. arXiv preprint arXiv:2305.20050","author":"Lightman Hunter","year":"2023","unstructured":"Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. 2023. Let's verify step by step. arXiv preprint arXiv:2305.20050 (2023)."},{"key":"e_1_3_2_1_16_1","first-page":"87","article-title":"AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration","volume":"6","author":"Lin Ji","year":"2024","unstructured":"Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. 2024. AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration. Proceedings of Machine Learning and Systems 6 (2024), 87--100.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_2_1_17_1","volume-title":"Counting carbon: A survey of factors influencing the emissions of machine learning. arXiv preprint arXiv:2302.08476","author":"Luccioni Alexandra Sasha","year":"2023","unstructured":"Alexandra Sasha Luccioni and Alex Hernandez-Garcia. 2023. Counting carbon: A survey of factors influencing the emissions of machine learning. arXiv preprint arXiv:2302.08476 (2023)."},{"key":"e_1_3_2_1_18_1","unstructured":"Alexandra Sasha Luccioni Emma Strubell and Kate Crawford. 2025. From Efficiency Gains to Rebound Effects: The Problem of Jevons' Paradox in AI's Polarized Environmental Debate. arXiv:2501.16548 [cs.CY] https:\/\/arxiv.org\/abs\/2501.16548"},{"key":"e_1_3_2_1_19_1","first-page":"1","article-title":"Estimating the carbon footprint of bloom, a 176b parameter language model","volume":"24","author":"Luccioni Alexandra Sasha","year":"2023","unstructured":"Alexandra Sasha Luccioni, Sylvain Viguier, and Anne-Laure Ligozat. 2023. Estimating the carbon footprint of bloom, a 176b parameter language model. Journal of Machine Learning Research 24, 253 (2023), 1--15.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_20_1","volume-title":"The 2024 ACM Conference on Fairness, Accountability, and Transparency. 85--99","author":"Luccioni Sasha","year":"2024","unstructured":"Sasha Luccioni, Yacine Jernite, and Emma Strubell. 2024. Power hungry processing: Watts driving the cost of AI deployment?. In The 2024 ACM Conference on Fairness, Accountability, and Transparency. 85--99."},{"key":"e_1_3_2_1_21_1","volume-title":"International conference on machine learning. PMLR","author":"Rajbhandari Samyam","year":"2022","unstructured":"Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yuxiong He. 2022. Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale. In International conference on machine learning. PMLR, 18332--18346."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC58863.2023.10363447"},{"key":"e_1_3_2_1_23_1","volume-title":"Beyond chinchilla-optimal: Accounting for inference in language model scaling laws. arXiv preprint arXiv:2401.00448","author":"Sardana Nikhil","year":"2023","unstructured":"Nikhil Sardana, Jacob Portes, Sasha Doubov, and Jonathan Frankle. 2023. Beyond chinchilla-optimal: Accounting for inference in language model scaling laws. arXiv preprint arXiv:2401.00448 (2023)."},{"key":"e_1_3_2_1_24_1","volume-title":"Scaling llm test-time compute optimally can be more effective than scaling model parameters. arXiv preprint arXiv:2408.03314","author":"Snell Charlie","year":"2024","unstructured":"Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. 2024. Scaling llm test-time compute optimally can be more effective than scaling model parameters. arXiv preprint arXiv:2408.03314 (2024)."},{"key":"e_1_3_2_1_25_1","volume-title":"Dynamollm: Designing llm inference clusters for performance and energy efficiency. arXiv preprint arXiv:2408.00741","author":"Stojkovic Jovan","year":"2024","unstructured":"Jovan Stojkovic, Chaojie Zhang, \u00cd\u00f1igo Goiri, Josep Torrellas, and Esha Choukse. 2024. Dynamollm: Designing llm inference clusters for performance and energy efficiency. arXiv preprint arXiv:2408.00741 (2024)."},{"key":"e_1_3_2_1_26_1","volume-title":"A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695","author":"Sun Mingjie","year":"2023","unstructured":"Mingjie Sun, Zhuang Liu, Anna Bair, and J Zico Kolter. 2023. A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695 (2023)."},{"key":"e_1_3_2_1_27_1","volume-title":"Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)."},{"key":"e_1_3_2_1_28_1","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)."},{"key":"e_1_3_2_1_29_1","volume-title":"Solving math word problems with process-and outcome-based feedback. arXiv preprint arXiv:2211.14275","author":"Uesato Jonathan","year":"2022","unstructured":"Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, and Irina Higgins. 2022. Solving math word problems with process-and outcome-based feedback. arXiv preprint arXiv:2211.14275 (2022)."},{"key":"e_1_3_2_1_30_1","volume-title":"Attention is all you need. Advances in Neural Information Processing Systems","author":"Vaswani A","year":"2017","unstructured":"A Vaswani. 2017. Attention is all you need. Advances in Neural Information Processing Systems (2017)."},{"key":"e_1_3_2_1_31_1","unstructured":"Pablo Villalobos and David Atkinson. 2023. Trading Off Compute in Training and Inference. https:\/\/epoch.ai\/blog\/trading-off-compute-in-training-and-inference Accessed: 2025-01-08."},{"key":"e_1_3_2_1_32_1","volume-title":"Math-shepherd: A label-free step-by-step verifier for llms in mathematical reasoning. arXiv preprint arXiv:2312.08935","author":"Wang Peiyi","year":"2023","unstructured":"Peiyi Wang, Lei Li, Zhihong Shao, RX Xu, Damai Dai, Yifei Li, Deli Chen, Y Wu, and Zhifang Sui. 2023. Math-shepherd: A label-free step-by-step verifier for llms in mathematical reasoning. arXiv preprint arXiv:2312.08935 (2023)."},{"key":"e_1_3_2_1_33_1","volume-title":"Aakanksha Chowdhery, and Denny Zhou.","author":"Wang Xuezhi","year":"2022","unstructured":"Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022)."},{"key":"e_1_3_2_1_34_1","volume-title":"Chain-of-thought reasoning without prompting. arXiv preprint arXiv:2402.10200","author":"Wang Xuezhi","year":"2024","unstructured":"Xuezhi Wang and Denny Zhou. 2024. Chain-of-thought reasoning without prompting. arXiv preprint arXiv:2402.10200 (2024)."},{"key":"e_1_3_2_1_35_1","volume-title":"Denny Zhou, et al.","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824--24837."},{"key":"e_1_3_2_1_36_1","volume-title":"Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems. arXiv preprint arXiv:2407.04014","author":"Wilkins Grant","year":"2024","unstructured":"Grant Wilkins, Srinivasan Keshav, and Richard Mortier. 2024. Offline Energy-Optimal LLM Serving: Workload-Based Energy Models for LLM Inference on Heterogeneous Systems. arXiv preprint arXiv:2407.04014 (2024)."},{"key":"e_1_3_2_1_37_1","unstructured":"Huajian Xin ZZ Ren Junxiao Song Zhihong Shao Wanjia Zhao Haocheng Wang Bo Liu Liyue Zhang Xuan Lu Qiushi Du et al. 2024. Deepseek-prover-v1. 5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search. arXiv preprint arXiv:2408.08152 (2024)."},{"key":"e_1_3_2_1_38_1","first-page":"46595","article-title":"Judging llm-as-a-judge with mt-bench and chatbot arena","volume":"36","author":"Zheng Lianmin","year":"2023","unstructured":"Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems 36 (2023), 46595--46623.","journal-title":"Advances in Neural Information Processing Systems"}],"event":{"name":"EuroMLSys '25: 5th Workshop on Machine Learning and Systems","location":"World Trade Center Rotterdam Netherlands","acronym":"EuroMLSys '25","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems"]},"container-title":["Proceedings of the 5th Workshop on Machine Learning and Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721146.3721953","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721146.3721953","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:39Z","timestamp":1750298259000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721146.3721953"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,30]]},"references-count":38,"alternative-id":["10.1145\/3721146.3721953","10.1145\/3721146"],"URL":"https:\/\/doi.org\/10.1145\/3721146.3721953","relation":{},"subject":[],"published":{"date-parts":[[2025,3,30]]},"assertion":[{"value":"2025-04-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}