{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T09:13:40Z","timestamp":1769505220456,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":90,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62025404, 62222411"],"award-info":[{"award-number":["62025404, 62222411"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2023YFB4404400"],"award-info":[{"award-number":["2023YFB4404400"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,18]]},"DOI":"10.1145\/3725843.3756038","type":"proceedings-article","created":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T17:19:56Z","timestamp":1760721596000},"page":"1160-1177","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["ReGate: Enabling Power Gating in Neural Processing Units"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-0363-9486","authenticated-orcid":false,"given":"Yuqi","family":"Xue","sequence":"first","affiliation":[{"name":"University of Illinois Urbana-Champaign, Urbana, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1125-671X","authenticated-orcid":false,"given":"Jian","family":"Huang","sequence":"additional","affiliation":[{"name":"University of Illinois Urbana-Champaign, Urbana, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,10,17]]},"reference":[{"key":"e_1_3_3_2_2_2","unstructured":"2024. The AI Spending Spree in Charts. https:\/\/www.wsj.com\/tech\/ai\/artificial-intelligence-investing-charts-7b8e1a97"},{"key":"e_1_3_3_2_3_2","unstructured":"2024. Big Tech\u2019s AI Surge Spikes Energy Use. https:\/\/www.baselinemag.com\/news\/big-techs-ai-surge-spikes-energy-use\/"},{"key":"e_1_3_3_2_4_2","unstructured":"2024. Energy-hungry Google AI shoots emissions up by 48% as data centers devour power. https:\/\/interestingengineering.com\/energy\/googles-emissions-jump-48-due-to-ai"},{"key":"e_1_3_3_2_5_2","unstructured":"2025. Inside Amazon\u2019s Race to Build the AI Industry\u2019s Biggest Datacenters. https:\/\/time.com\/7273288\/amazon-anthropic-openai-microsoft-stargate-datacenters\/"},{"key":"e_1_3_3_2_6_2","unstructured":"2025. TensorFLow Model Garden. https:\/\/github.com\/tensorflow\/models"},{"key":"e_1_3_3_2_7_2","unstructured":"2025. US utilities grapple with Big Tech\u2019s massive power demands for data centers. https:\/\/www.reuters.com\/business\/energy\/us-utilities-grapple-with-big-techs-massive-power-demands-data-centers-2025-04-07\/"},{"key":"e_1_3_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540719"},{"key":"e_1_3_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2934583.2934606"},{"key":"e_1_3_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42613.2021.9365791"},{"key":"e_1_3_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/RAEEUCCI61380.2024.10547970"},{"key":"e_1_3_3_2_12_2","unstructured":"AMD. [n. d.]. AXI High Bandwidth Memory Controller LogiCORE IP Product Guide (PG276). https:\/\/docs.amd.com\/r\/en-US\/pg276-axi-hbm"},{"key":"e_1_3_3_2_13_2","doi-asserted-by":"publisher","unstructured":"Jason Ansel Edward Yang Horace He Natalia Gimelshein Animesh Jain Michael Voznesensky Bin Bao Peter Bell David Berard Evgeni Burovski Geeta Chauhan Anjali Chourdia Will Constable Alban Desmaison Zachary DeVito Elias Ellison Will Feng Jiong Gong Michael Gschwind Brian Hirsh Sherlock Huang Kshiteej Kalambarkar Laurent Kirsch Michael Lazos Mario Lezcano Yanbo Liang Jason Liang Yinghai Lu C.\u00a0K. Luk Bert Maher Yunjie Pan Christian Puhrsch Matthias Reso Mark Saroufim Marcos\u00a0Yukio Siraichi Helen Suk Shunting Zhang Michael Suo Phil Tillet Xu Zhao Eikan Wang Keren Zhou Richard Zou Xiaodong Wang Ajit Mathews William Wen Gregory Chanan Peng Wu and Soumith Chintala. 2024. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation(ASPLOS \u201924). Association for Computing Machinery New York NY USA 929\u2013947. 10.1145\/3620665.3640366","DOI":"10.1145\/3620665.3640366"},{"key":"e_1_3_3_2_14_2","unstructured":"OpenXLA Authors. 2024. Developing a new backend for XLA. https:\/\/openxla.org\/xla\/developing_new_backend"},{"key":"e_1_3_3_2_15_2","unstructured":"AWS. 2025. Trainium Architecture. https:\/\/awsdocs-neuron.readthedocs-hosted.com\/en\/latest\/general\/arch\/neuron-hardware\/trainium.html"},{"key":"e_1_3_3_2_16_2","unstructured":"Amazon AWS. 2023. AWS Inferentia. https:\/\/aws.amazon.com\/machine-learning\/inferentia\/"},{"key":"e_1_3_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC19947.2020.9062967"},{"key":"e_1_3_3_2_18_2","volume-title":"Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918)","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918). Carlsbad, CA."},{"key":"e_1_3_3_2_19_2","first-page":"851","volume-title":"2023 USENIX Annual Technical Conference (USENIX ATC 23)","author":"Choi Sangjin","year":"2023","unstructured":"Sangjin Choi, Inhoe Koo, Jeongseob Ahn, Myeongjae Jeon, and Youngjin Kwon. 2023. EnvPipe: Performance-preserving DNN Training Framework for Saving Energy. In 2023 USENIX Annual Technical Conference (USENIX ATC 23). USENIX Association, Boston, MA, 851\u2013864. https:\/\/www.usenix.org\/conference\/atc23\/presentation\/choi"},{"key":"e_1_3_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3694715.3695970"},{"key":"e_1_3_3_2_21_2","doi-asserted-by":"publisher","unstructured":"Lawrence\u00a0T. Clark Vinay Vashishtha Lucian Shifren Aditya Gujja Saurabh Sinha Brian Cline Chandarasekaran Ramamurthy and Greg Yeric. 2016. ASAP7: A 7-nm finFET predictive process design kit. Microelectronics Journal 53 (2016) 105\u2013115. 10.1016\/j.mejo.2016.04.006","DOI":"10.1016\/j.mejo.2016.04.006"},{"key":"e_1_3_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3695053.3731409"},{"key":"e_1_3_3_2_23_2","unstructured":"Intel Corporation. 2020. C\u2011State. https:\/\/www.intel.com\/content\/www\/us\/en\/docs\/socwatch\/user-guide\/2020\/c-state.html"},{"key":"e_1_3_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/IGCC.2014.7039170"},{"key":"e_1_3_3_2_25_2","unstructured":"OpenXLA Developers. 2025. XLA Tooling. https:\/\/openxla.org\/xla\/tools#hlo-opt_hlo_pass_development_and_debugging"},{"key":"e_1_3_3_2_26_2","unstructured":"Atul Dhamba and Anand\u00a0V Kulkarni. [n. d.]. Design Considerations for High Bandwidth Memory Controller. https:\/\/www.design-reuse.com\/articles\/41186\/design-considerations-for-high-bandwidth-memory-controller.html"},{"key":"e_1_3_3_2_27_2","unstructured":"Ahmad Faiz Sotaro Kaneda Ruhan Wang Rita Osi Prateek Sharma Fan Chen and Lei Jiang. 2024. LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models. arxiv:https:\/\/arXiv.org\/abs\/2309.14393\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/2309.14393"},{"key":"e_1_3_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/545214.545232"},{"key":"e_1_3_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2016.70"},{"key":"e_1_3_3_2_30_2","unstructured":"Google. 2022. System Architecture - Cloud TPU. https:\/\/cloud.google.com\/tpu\/docs\/system-architecture-tpu-vm"},{"key":"e_1_3_3_2_31_2","unstructured":"Google. 2023. XLA: Optimizing Compiler for Machine Learning. https:\/\/www.tensorflow.org\/xla"},{"key":"e_1_3_3_2_32_2","unstructured":"Google. 2024. 2024 Environmental Report. https:\/\/www.gstatic.com\/gumdrop\/sustainability\/google-2024-environmental-report.pdf"},{"key":"e_1_3_3_2_33_2","unstructured":"Google. 2025. Growing the internet while reducing energy consumption. https:\/\/datacenters.google\/efficiency\/"},{"key":"e_1_3_3_2_34_2","unstructured":"Aaron Grattafiori et\u00a0al. 2024. The Llama 3 Herd of Models. arxiv:https:\/\/arXiv.org\/abs\/2407.21783\u00a0[cs.AI] https:\/\/arxiv.org\/abs\/2407.21783"},{"key":"e_1_3_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527408"},{"key":"e_1_3_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00076"},{"key":"e_1_3_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/1013235.1013249"},{"key":"e_1_3_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589350"},{"key":"e_1_3_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00010"},{"key":"e_1_3_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42613.2021.9366001"},{"key":"e_1_3_3_2_41_2","doi-asserted-by":"publisher","unstructured":"Joshua Kalyanapu Farshad Dizani Azam Ghanbari Darsh Asher and Samira\u00a0Mirbagher Ajorpaz. 2025. Exploiting Intel AMX Power Gating. IEEE Computer Architecture Letters 24 1 (2025) 113\u2013116. 10.1109\/LCA.2025.3555183","DOI":"10.1109\/LCA.2025.3555183"},{"key":"e_1_3_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480063"},{"key":"e_1_3_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC49657.2024.10454301"},{"key":"e_1_3_3_2_44_2","unstructured":"Patrick Kennedy. 2024. Tenstorrent Blackhole and Metalium For Standalone AI Processing. https:\/\/www.servethehome.com\/tenstorrent-blackhole-and-metalium-for-standalone-ai-processing\/"},{"key":"e_1_3_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/HCS52781.2021.9567075"},{"key":"e_1_3_3_2_46_2","doi-asserted-by":"publisher","unstructured":"Rakesh Kumar Alejandro Mart\u00ednez and Antonio Gonz\u00e1lez. 2014. Efficient Power Gating of SIMD Accelerators Through Dynamic Selective Devectorization in an HW\/SW Codesigned Environment. ACM Trans. Archit. Code Optim. 11 3 Article 25 (July 2014) 23\u00a0pages. 10.1145\/2629681","DOI":"10.1145\/2629681"},{"key":"e_1_3_3_2_47_2","doi-asserted-by":"publisher","unstructured":"Michael\u00a0A. Laurenzano Yunqi Zhang Jiang Chen Lingjia Tang and Jason Mars. 2016. PowerChop: identifying and managing non-critical units in hybrid processor architectures. SIGARCH Comput. Archit. News 44 3 (June 2016) 140\u2013152. 10.1145\/3007787.3001152","DOI":"10.1145\/3007787.3001152"},{"key":"e_1_3_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607035"},{"key":"e_1_3_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669172"},{"key":"e_1_3_3_2_50_2","unstructured":"Yueying Li Zhanqiu Hu Esha Choukse Rodrigo Fonseca G.\u00a0Edward Suh and Udit Gupta. 2025. EcoServe: Designing Carbon-Aware AI Inference Systems. arxiv:https:\/\/arXiv.org\/abs\/2502.05043\u00a0[cs.DC] https:\/\/arxiv.org\/abs\/2502.05043"},{"key":"e_1_3_3_2_51_2","unstructured":"Yuheng Li Haotian Liu Qingyang Wu Fangzhou Mu Jianwei Yang Jianfeng Gao Chunyuan Li and Yong\u00a0Jae Lee. 2023. GLIGEN: Open-Set Grounded Text-to-Image Generation. arxiv:https:\/\/arXiv.org\/abs\/2301.07093\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2301.07093"},{"key":"e_1_3_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3694715.3695955"},{"key":"e_1_3_3_2_53_2","unstructured":"Yiqi Liu Yuqi Xue Noelle Crawford Jilong Xue and Jian Huang. 2025. Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques. arxiv:https:\/\/arXiv.org\/abs\/2507.11506\u00a0[cs.AR] https:\/\/arxiv.org\/abs\/2507.11506"},{"key":"e_1_3_3_2_54_2","unstructured":"Stefano Lovati. 2023. Recent advances in negative capacitance gate-all-around field effect transistors. https:\/\/www.powerelectronicsnews.com\/recent-advances-in-negative-capacitance-gate-all-around-field-effect-transistors\/. Accessed: 2025-06-12."},{"key":"e_1_3_3_2_55_2","unstructured":"Asit Mishra Jorge\u00a0Albericio Latorre Jeff Pool Darko Stosic Dusan Stosic Ganesh Venkatesh Chong Yu and Paulius Micikevicius. 2021. Accelerating Sparse Deep Neural Networks. arxiv:https:\/\/arXiv.org\/abs\/2104.08378\u00a0[cs.LG] https:\/\/arxiv.org\/abs\/2104.08378"},{"key":"e_1_3_3_2_56_2","unstructured":"Ann Mutschler. 2017. Power Challenges At 10nm And Below. https:\/\/semiengineering.com\/power-challenges-at-10nm-and-below\/. Accessed: 2025-06-12."},{"key":"e_1_3_3_2_57_2","unstructured":"Ann Mutschler. 2018. Power Issues Grow For Cloud Chips. https:\/\/semiengineering.com\/power-issues-grow-in-high-performance-computing\/. Accessed: 2025-06-12."},{"key":"e_1_3_3_2_58_2","unstructured":"Maxim Naumov Dheevatsa Mudigere Hao-Jun\u00a0Michael Shi Jianyu Huang Narayanan Sundaraman Jongsoo Park Xiaodong Wang Udit Gupta Carole-Jean Wu Alisson\u00a0G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR abs\/1906.00091 (2019). https:\/\/arxiv.org\/abs\/1906.00091"},{"key":"e_1_3_3_2_59_2","doi-asserted-by":"publisher","unstructured":"Thomas Norrie Nishant Patil Doe\u00a0Hyun Yoon George Kurian Sheng Li James Laudon Cliff Young Norman Jouppi and David Patterson. 2021. The Design Process for Google\u2019s Training Chips: TPUv2 and TPUv3. IEEE Micro 41 2 (2021) 56\u201363. 10.1109\/MM.2021.3058217","DOI":"10.1109\/MM.2021.3058217"},{"key":"e_1_3_3_2_60_2","unstructured":"NVIDIA. [n. d.]. TensorRT SDK. https:\/\/developer.nvidia.com\/tensorrt"},{"key":"e_1_3_3_2_61_2","unstructured":"NVIDIA. 2022. NVIDIA H100 Tensor Core GPU Architecture. https:\/\/www.advancedclustering.com\/wp-content\/uploads\/2022\/03\/gtc22-whitepaper-hopper.pdf"},{"key":"e_1_3_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18074.2021.9586224"},{"key":"e_1_3_3_2_63_2","doi-asserted-by":"publisher","unstructured":"David Patterson Joseph Gonzalez Urs H\u00f6lzle Quoc Le Chen Liang Lluis-Miquel Munguia Daniel Rothchild David\u00a0R. So Maud Texier and Jeff Dean. 2022. The Carbon Footprint of Machine Learning Training Will Plateau Then Shrink. Computer 55 7 (2022) 18\u201328. 10.1109\/MC.2022.3148714","DOI":"10.1109\/MC.2022.3148714"},{"key":"e_1_3_3_2_64_2","unstructured":"David Patterson Joseph Gonzalez Quoc Le Chen Liang Lluis-Miquel Munguia Daniel Rothchild David So Maud Texier and Jeff Dean. 2021. Carbon Emissions and Large Neural Network Training. arxiv:https:\/\/arXiv.org\/abs\/2104.10350\u00a0[cs.LG] https:\/\/arxiv.org\/abs\/2104.10350"},{"key":"e_1_3_3_2_65_2","unstructured":"William Peebles and Saining Xie. 2023. Scalable Diffusion Models with Transformers. arxiv:https:\/\/arXiv.org\/abs\/2212.09748\u00a0[cs.CV] https:\/\/arxiv.org\/abs\/2212.09748"},{"key":"e_1_3_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/344166.344526"},{"key":"e_1_3_3_2_67_2","first-page":"75","volume-title":"2024 USENIX Annual Technical Conference (USENIX ATC 24)","author":"Qiu Haoran","year":"2024","unstructured":"Haoran Qiu, Weichao Mao, Archit Patke, Shengkun Cui, Saurabh Jha, Chen Wang, Hubertus Franke, Zbigniew Kalbarczyk, Tamer Ba\u015far, and Ravishankar\u00a0K. Iyer. 2024. Power-aware Deep Learning Model Serving with \u03bc -Serve. In 2024 USENIX Annual Technical Conference (USENIX ATC 24). USENIX Association, Santa Clara, CA, 75\u201393. https:\/\/www.usenix.org\/conference\/atc24\/presentation\/qiu"},{"key":"e_1_3_3_2_68_2","unstructured":"Rambus. [n. d.]. HBM3E \/ HBM3 Controller IP. https:\/\/www.rambus.com\/interface-ip\/hbm\/hbm3-controller\/"},{"key":"e_1_3_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080212"},{"key":"e_1_3_3_2_70_2","doi-asserted-by":"publisher","unstructured":"Hafeez Raza Shivendra Singh\u00a0Parihar Yogesh Singh\u00a0Chauhan Hussam Amrouch and Avinash Lahgere. 2025. An Investigation of Minimum Supply Voltage of 5-nm SRAM From 300 K Down to 10 K. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 11 (2025) 42\u201350. 10.1109\/JXCDC.2025.3560215","DOI":"10.1109\/JXCDC.2025.3560215"},{"key":"e_1_3_3_2_71_2","unstructured":"Vijay\u00a0Janapa Reddi Christine Cheng David Kanter Peter Mattson Guenther Schmuelling Carole-Jean Wu Brian Anderson Maximilien Breughe Mark Charlebois William Chou Ramesh Chukka Cody Coleman Sam Davis Pan Deng Greg Diamos Jared Duke Dave Fick J.\u00a0Scott Gardner Itay Hubara Sachin Idgunji Thomas\u00a0B. Jablin Jeff Jiao Tom\u00a0St. John Pankaj Kanwar David Lee Jeffery Liao Anton Lokhmotov Francisco Massa Peng Meng Paulius Micikevicius Colin Osborne Gennady Pekhimenko Arun Tejusve\u00a0Raghunath Rajan Dilip Sequeira Ashish Sirasao Fei Sun Hanlin Tang Michael Thomson Frank Wei Ephrem Wu Lingjie Xu Koichi Yamada Bing Yu George Yuan Aaron Zhong Peizhao Zhang and Yuchen Zhou. 2020. MLPerf Inference Benchmark. arxiv:https:\/\/arXiv.org\/abs\/1911.02549"},{"key":"e_1_3_3_2_72_2","unstructured":"Dan Robinson. 2024. Google now \u2019third-largest\u2019 in datacenter processors. https:\/\/www.theregister.com\/2024\/05\/21\/google_now_thirdlargest_in_datacenter\/"},{"key":"e_1_3_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2009.5117928"},{"key":"e_1_3_3_2_74_2","unstructured":"RUN:AI. 2022. Google TPU Architecture and Performance Best Practices. https:\/\/www.run.ai\/guides\/cloud-deep-learning\/google-tpu"},{"key":"e_1_3_3_2_75_2","doi-asserted-by":"publisher","unstructured":"Mohammad Sadrosadati Seyed\u00a0Borna Ehsani Hajar Falahati Rachata Ausavarungnirun Arash Tavakkol Mojtaba Abaee Lois Orosa Yaohua Wang Hamid Sarbazi-Azad and Onur Mutlu. 2019. ITAP: Idle-Time-Aware Power Management for GPU Execution Units. ACM Trans. Archit. Code Optim. 16 1 Article 3 (Feb. 2019) 26\u00a0pages. 10.1145\/3291606","DOI":"10.1145\/3291606"},{"key":"e_1_3_3_2_76_2","unstructured":"Ian Schneider Hui Xu Stephan Benecke David Patterson Keguo Huang Parthasarathy Ranganathan and Cooper Elsworth. 2025. Life-Cycle Emissions of AI Hardware: A Cradle-To-Grave Approach and Generational Trends. arxiv:https:\/\/arXiv.org\/abs\/2502.01671\u00a0[cs.AR] https:\/\/arxiv.org\/abs\/2502.01671"},{"key":"e_1_3_3_2_77_2","unstructured":"Mohammad Shoeybi Mostofa Patwary Raul Puri Patrick LeGresley Jared Casper and Bryan Catanzaro. 2020. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. arxiv:https:\/\/arXiv.org\/abs\/1909.08053\u00a0[cs.CL] https:\/\/arxiv.org\/abs\/1909.08053"},{"key":"e_1_3_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/3676641.3716025"},{"key":"e_1_3_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA61900.2025.00102"},{"key":"e_1_3_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42615.2023.10067817"},{"key":"e_1_3_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00075"},{"key":"e_1_3_3_2_82_2","unstructured":"Wendy Torell. [n. d.]. Liquid vs. Air Cooling. Which is the Capex winner?https:\/\/blog.se.com\/datacenter\/architecture\/2020\/02\/24\/liquid-vs-air-cooling-which-is-the-capex-winner\/"},{"key":"e_1_3_3_2_83_2","unstructured":"Hugo Touvron et\u00a0al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arxiv:https:\/\/arXiv.org\/abs\/2307.09288\u00a0[cs.CL]"},{"key":"e_1_3_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA59077.2024.00041"},{"key":"e_1_3_3_2_85_2","first-page":"795","volume-title":"Proceedings of Machine Learning and Systems","volume":"4","author":"Wu Carole-Jean","year":"2022","unstructured":"Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga, Jinshi Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin Lee, Bugra Akyildiz, Maximilian Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, and Kim Hazelwood. 2022. Sustainable AI: Environmental Implications, Challenges and Opportunities. In Proceedings of Machine Learning and Systems , D.\u00a0Marculescu, Y.\u00a0Chi, and C.\u00a0Wu (Eds.), Vol.\u00a04. 795\u2013813. https:\/\/proceedings.mlsys.org\/paper_files\/paper\/2022\/file\/462211f67c7d858f663355eff93b745e-Paper.pdf"},{"key":"e_1_3_3_2_86_2","doi-asserted-by":"publisher","unstructured":"Qing Xie Xue Lin Yanzhi Wang Shuang Chen Mohammad\u00a0Javad Dousti and Massoud Pedram. 2015. Performance Comparisons Between 7-nm FinFET and Conventional Bulk CMOS Standard Cell Libraries. IEEE Transactions on Circuits and Systems II: Express Briefs 62 8 (2015) 761\u2013765. 10.1109\/TCSII.2015.2391632","DOI":"10.1109\/TCSII.2015.2391632"},{"key":"e_1_3_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3593856.3595912"},{"key":"e_1_3_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589059"},{"key":"e_1_3_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO61859.2024.00011"},{"key":"e_1_3_3_2_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD51958.2021.9643508"},{"key":"e_1_3_3_2_91_2","first-page":"761","volume-title":"21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24)","author":"Zu Yazhou","year":"2024","unstructured":"Yazhou Zu, Alireza Ghaffarkhah, Hoang-Vu Dang, Brian Towles, Steven Hand, Safeen Huda, Adekunle Bello, Alexander Kolbasov, Arash Rezaei, Dayou Du, Steve Lacy, Hang Wang, Aaron Wisner, Chris Lewis, and Henri Bahini. 2024. Resiliency at Scale: Managing Google\u2019s TPUv4 Machine Learning Supercomputer. In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 24). USENIX Association, Santa Clara, CA, 761\u2013774. https:\/\/www.usenix.org\/conference\/nsdi24\/presentation\/zu"}],"event":{"name":"MICRO 2025: 58th IEEE\/ACM International Symposium on Microarchitecture","location":"Seoul Korea","acronym":"MICRO 2025","sponsor":["SIGMICRO ACM Special Interest Group on Microarchitectural Research and Processing"]},"container-title":["Proceedings of the 58th IEEE\/ACM International Symposium on Microarchitecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3725843.3756038","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,26]],"date-time":"2026-01-26T21:47:04Z","timestamp":1769464024000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3725843.3756038"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,17]]},"references-count":90,"alternative-id":["10.1145\/3725843.3756038","10.1145\/3725843"],"URL":"https:\/\/doi.org\/10.1145\/3725843.3756038","relation":{},"subject":[],"published":{"date-parts":[[2025,10,17]]},"assertion":[{"value":"2025-10-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}