{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T14:03:07Z","timestamp":1762869787422,"version":"build-2065373602"},"reference-count":71,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>Generative AI (GenAI) is one of the most critical applications today, continually challenging the limits of semiconductor technology. We introduce a very fine-grained 3D memory-on-logic architecture along with a novel data mapping strategy to support Large Language Model (LLM)-based GenAI, including both prefill and generation stages. Our conceptual analysis shows how ultradense 3D connectivity can enhance text generation speed and energy-efficiency well-beyond current limits. Preliminary findings from a basic analytical model indicate that the single batch autoregressive generation rate for Llama 3.2 1B could surpass 5K tokens\/sec by maximizing weight locality and enhancing memory bandwidth through massively parallel 3D links between Multiply-Accumulate (MAC) units in the logic tier and their dedicated memory partitions in the 3D stack. We also explore the impact of advanced logic nodes and quantify their benefits in reducing prefill latency. Finally, we examine the challenges associated with memory access power and power density under extreme bandwidth conditions and present pipelined access strategies to address them.<\/jats:p>","DOI":"10.1145\/3768168","type":"journal-article","created":{"date-parts":[[2025,9,12]],"date-time":"2025-09-12T11:51:04Z","timestamp":1757677864000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Ultrafast Generative AI by Ultradense 3D Integration: A Case Study on LLM-based Edge Inference"],"prefix":"10.1145","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5957-826X","authenticated-orcid":false,"given":"Kerem","family":"Akarvardar","sequence":"first","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Company North America","place":["San Jose, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5337-5680","authenticated-orcid":false,"given":"Xiaoyu","family":"Sun","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Company North America","place":["San Jose, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-0693-6887","authenticated-orcid":false,"given":"Brian","family":"Crafton","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Company North America","place":["San Jose, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6148-7711","authenticated-orcid":false,"given":"Xiaochen","family":"Peng","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Company North America","place":["San Jose, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8149-393X","authenticated-orcid":false,"given":"Haruki","family":"Mori","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Co Ltd","place":["Hsinchu, Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7721-271X","authenticated-orcid":false,"given":"Abhiroop","family":"Bhattacharjee","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Company North America","place":["San Jose, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-0992-5859","authenticated-orcid":false,"given":"Hidehiro","family":"Fujiwara","sequence":"additional","affiliation":[{"name":"Taiwan Semiconductor Manufacturing Co Ltd","place":["Hsinchu, Taiwan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0096-1472","authenticated-orcid":false,"given":"H.-S. Philip","family":"Wong","sequence":"additional","affiliation":[{"name":"Electrical Engineering, Stanford University","place":["Stanford, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,11,11]]},"reference":[{"key":"e_1_3_4_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2022.3218057"},{"volume-title":"Tutorial T9, 2025 IEEE International Solid-State Circuits Conference (ISSCC)","key":"e_1_3_4_3_2","unstructured":"P. Whatmough. 2025. Generative AI on Edge Devices: Models, Hardware, and Systems. Tutorial T9, 2025 IEEE International Solid-State Circuits Conference (ISSCC)."},{"key":"e_1_3_4_4_2","unstructured":"OpenAI. 2022. Introducing ChatGPT. Retrieved from https:\/\/openai.com\/index\/chatgpt\/. Accessed: 5.15.2025."},{"key":"e_1_3_4_5_2","unstructured":"S. Kim C. Hooper T. Wattanawong M. Kang R. Yan H. Genc G. Dinh Q. Huang K. Keutzer M. W. Mahoney and Y. S. Shao. 2023. Full stack optimization of transformer inference: A survey. arXiv preprint arXiv:2302.14017. Retrieved from https:\/\/arxiv.org\/abs\/2302.14017"},{"key":"e_1_3_4_6_2","unstructured":"L. Chen Z. Wang S. Ren L. Li H. Zhao Y. Li Z. Cai H. Guo L. Zhang Y. Xiong Y. Zhang R. Wu Q. Dong G. Zhang J. Yang L. Meng S. Hu Y. Chen J. Lin S. Bai A. Vlachos X. Tan M. Zhang W. Xiao A. Yee T. Liu and B. Chang. 2024. Next token prediction towards multimodal intelligence: A comprehensive survey. arXiv preprint arXiv:2412.18619. Retrieved from https:\/\/arxiv.org\/abs\/2412.18619"},{"key":"e_1_3_4_7_2","first-page":"2023","volume-title":"Proceedings of the 2023 International Electron Devices Meeting","year":"2023","unstructured":"Y. Wang, Y. H. Chen, Y. D. Chih, H. Fujiwara, H. Mori, Y. J. Wang, and T. Y. J. Chang. 2023. High-speed embedded memory for AI and high-performance compute. In Proceedings of the 2023 International Electron Devices Meeting. IEEE, 2023."},{"volume-title":"Proceedings of the 2024 IEEE International Solid-State Circuits Conference","year":"2024","key":"e_1_3_4_8_2","unstructured":"Y. C. Huang, S. H Liu, H. S. Chen, H. C. Feng, C. F. Li, C. Y. Yang, W. K. Chang, C. F. Yang, C. Y. Wu, Y. C. Lin, and T. T. Yang. 2024. 15.7 A 32Mb RRAM in a 12nm FinFET Technology with a 0.0249 \u03bcm 2 Bit-Cell, a 3.2 GB\/S Read Throughput, a 10K Cycle Write Endurance and a 10-Year Retention at 105\u00b0 C. In Proceedings of the 2024 IEEE International Solid-State Circuits Conference. IEEE, 2024."},{"key":"e_1_3_4_9_2","first-page":"331","volume-title":"Microelectronic Engineering","volume":"88","year":"2011","unstructured":"M. Vinet, P. Batude, C. Tabone, B. Previtali, C. LeRoyer, A. Pouydebasque, L. Clavelier, A. Valentian, O. Thomas, S. Michaud, and L. Sanchez. 20211. 3D monolithic integration: Technological challenges and electrical results. Microelectronic Engineering 88, 4 (2011), 331\u2013335."},{"volume-title":"Proceedings of the 2017 IEEE International Memory Workshop","year":"2017","key":"e_1_3_4_10_2","unstructured":"H. Jun, J. Cho, K. Lee, H.Y. Son, K. Kim, H. Jin, and K. Kim. 2017. HBM (High Bandwidth Memory) DRAM technology and architecture. In Proceedings of the 2017 IEEE International Memory Workshop. IEEE, 2017."},{"volume-title":"Proceedings of the 2024 IEEE 74th Electronic Components and Technology Conference","key":"e_1_3_4_11_2","unstructured":"W.-M Wang, C. W. Yeh, H. J. Chia, R. F. Tsui, J. J. Cui, C. H. Tung, K. C. Yee, and D. C. H. Yu. 2024. A study of low temperature SoIC targeting 200 nm bond pitch. In Proceedings of the 2024 IEEE 74th Electronic Components and Technology Conference."},{"volume-title":"Proceedings of the 2024 IEEE International Electron Devices Meeting","year":"2024","key":"e_1_3_4_12_2","unstructured":"Y.-M. Chen, T. Ko, K. C. Ting, S. K. Goel, A. Patidar, K. H. Tam, K. Huang, W. P. Changchien, W. Y. Wang, S. H. Huang, C. Y. Huang, C. H. Wang, W. Lai, Y. H. Lung, S. C. Lin, S. F. Yeh, C. W. Shih, T. J. Wu, Y. C. Lin, Y. H. Chen, H. J. Lin, M. S. Yeh, T. Y. Chen, H. Y. Pan, T. S. Lin, C. C. Hu, C. Bair, S. B. Jan, L.C. Hung, L. W. Wang, D. H. Chen, C. H. Yao, T. C. Huang, J. H. Shieh, W. C. Chiou, S. S. Lin, F. Lee, G. Yeap, L. C. Lu, and K. C. Hsu. 2024. Next generation TSMC-SoIC\u00ae platform for ultra-high bandwidth HPC application. In Proceedings of the 2024 IEEE International Electron Devices Meeting. IEEE, 2024."},{"key":"e_1_3_4_13_2","first-page":"1","volume-title":"Proceedings of the 2024 IEEE International Electron Devices Meeting","year":"2024","unstructured":"J. Wuu, M. Mantor, G. H. Loh, A. Smith, D. Johnson, D. Fisher, B. Johnson, C. Henrion, R. Schreiber, J. Lucas, and S. Dussinger. 2024. Coevolution of chiplet technology and cache architecture for ai and compute. In Proceedings of the 2024 IEEE International Electron Devices Meeting. San Francisco, CA, 1\u20134."},{"volume-title":"Proceedings of the 2024 IEEE Symposium on VLSI Technology and Circuits","year":"2024","key":"e_1_3_4_14_2","unstructured":"A. Smith, G. H. Loh, J. Wuu, S. Naffziger, T. Huang, H. McIntyre, R. Mangaser, W. Jung, and R. Swaminathan. 2024. AMD Instinct\u2122 MI300X accelerator: Packaging and architecture co-optimization. In Proceedings of the 2024 IEEE Symposium on VLSI Technology and Circuits. IEEE, 2024."},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42613.2021.9365766"},{"key":"e_1_3_4_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC42614.2022.9731754"},{"key":"e_1_3_4_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2023.3333290"},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2021.3113475"},{"key":"e_1_3_4_19_2","first-page":"10","volume-title":"IEEE Micro","volume":"38","year":"2018","unstructured":"N. Jouppi, C. Young, N. Patil, and D. Patterson. 2018. Motivation for and evaluation of the first tensor processing unit. IEEE Micro 38, 3 (2018), 10\u201319."},{"key":"e_1_3_4_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2024.3373763"},{"volume-title":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","key":"e_1_3_4_21_2","unstructured":"S.-C. Kao, H. Kwon, M. Pellauer, A. Parashar, and T. Krishna. 2022. A formalism of DNN accelerator flexibility. Proceedings of the ACM on Measurement and Analysis of Computing Systems 6.2 (2022), 1--23."},{"key":"e_1_3_4_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TED.2020.3021358"},{"key":"e_1_3_4_23_2","unstructured":"Meta. 2024. Introducing quantized Llama models with increased speed and a reduced memory footprint. Retrieved from https:\/\/ai.meta.com\/blog\/meta-llama-quantized-lightweight-models\/. Accessed: 5.15.2025."},{"key":"e_1_3_4_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC49661.2025.10904759"},{"key":"e_1_3_4_25_2","unstructured":"TechInsights. 2025. Micron D1? '14 nm'! The Most Advanced Node Ever on DRAM! Retrieved from: https:\/\/www.techinsights.com\/blog\/memory\/micron-1a-dram-technology. Accessed: 5.15.2025."},{"key":"e_1_3_4_26_2","unstructured":"EETimes. 2025. D-Matrix Targets Fast LLM Inference for `Real World Scenarios. Retrieved from https:\/\/www.eetimes.com\/d-matrix-targets-fast-llm-inference-for-real-world-scenarios\/. Accessed: 5.15.2025."},{"key":"e_1_3_4_27_2","unstructured":"Apple. 2022. Deploying Transformers on the Apple Neural Engine. Retrieved from https:\/\/machinelearning.apple.com\/research\/neural-engine-transformers. Accessed: 5.15.2025."},{"key":"e_1_3_4_28_2","unstructured":"M. O'Connor. 2021. Energy efficient high bandwidth DRAM for throughput processors. Ph.D. Dissertation. Retrieved from https:\/\/repositories.lib.utexas.edu\/items\/6cde2f48-be0b-4c10-be41-337accaa1f2b. Accessed: 5.15.2025."},{"key":"e_1_3_4_29_2","unstructured":"NVIDIA. 2025. Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era. Retrieved from https:\/\/developer.nvidia.com\/blog\/inside-nvidia-blackwell-ultra-the-chip-powering-the-ai-factory-era\/. Accessed: 9.25.2025."},{"key":"e_1_3_4_30_2","unstructured":"Telnyx. 2025. Comprehensive guide to embedding layers in NLP. Retrieved from https:\/\/telnyx.com\/learn-ai\/embedding-layer. Accessed: 9.25.2025."},{"key":"e_1_3_4_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.58"},{"volume-title":"Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture","year":"2017","key":"e_1_3_4_32_2","unstructured":"M. O'Connor, N. Chatterjee, D. Lee, J. Wilson, A. Agrawal, S. W. Keckler, and W. J. Dally. 2017. Fine-grained DRAM: Energy-efficient DRAM for extreme bandwidth systems. In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture."},{"key":"e_1_3_4_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/ECTC51529.2024.00171"},{"key":"e_1_3_4_34_2","first-page":"380","volume-title":"ACM SIGARCH Computer Architecture News","volume":"44","year":"2016","unstructured":"D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. ACM SIGARCH Computer Architecture News 44, 3 (2016), 380\u2013392."},{"volume-title":"Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems","year":"2017","key":"e_1_3_4_35_2","unstructured":"M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis. 2017. Tetris: Scalable and efficient neural network acceleration with 3d memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 2017."},{"volume-title":"Proceedings of the 29th ACM\/IEEE International Symposium on Low Power Electronics and Design","year":"2024","key":"e_1_3_4_36_2","unstructured":"H. J. Byun, U. Gupta, and J.-S. Seo. 2024. 3D IC architecture evaluation and optimization with digital compute-in-memory designs. In Proceedings of the 29th ACM\/IEEE International Symposium on Low Power Electronics and Design. 2024."},{"key":"e_1_3_4_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3649219"},{"volume-title":"IEEE Nanotechnology Magazine","year":"2025","key":"e_1_3_4_38_2","unstructured":"P.-K. Hsu, J. Sharda, X. Wu, H.-S. P. Wong, and S. Yu. 2025. Monolithic 3D Stackable DRAM. IEEE Nanotechnology Magazine (2025)."},{"volume-title":"Proceedings of the 2024 IEEE 6th International Conference on AI Circuits and Systems","year":"2024","key":"e_1_3_4_39_2","unstructured":"J. Sharda, P. K. Hsu, and S. Yu. 2024. Accelerator design using 3D stacked capacitorless DRAM for large language models. In Proceedings of the 2024 IEEE 6th International Conference on AI Circuits and Systems. IEEE, 2024."},{"volume-title":"Proceedings of the 2024 IEEE International Electron Devices Meeting","year":"2024","key":"e_1_3_4_40_2","unstructured":"H. Choi, G. Kim, W. Shin, J. Won, C. Kim, H. Joo, B. An, G. Shin, J. Kim, D. Yun, J. Park, and Y. Song. 2024. AiMX: Accelerator-in Memory Based Accelerator for Cost-effective Large Language Model Inference. In Proceedings of the 2024 IEEE International Electron Devices Meeting. IEEE, (2024)."},{"volume-title":"Proceedings of the 2023 IEEE Hot Chips 35 Symposium","year":"2023","key":"e_1_3_4_41_2","unstructured":"J. H. Kim, Y. Ro, J. So, S. Lee, S. Kang, Y. G. Cho, H. Kim, B. Kim, K. Kim, S. Park, and J. S. Kim. 2023. Samsung PIM\/PNM for transformer based AI: Energy efficiency on PIM\/PNM cluster. In Proceedings of the 2023 IEEE Hot Chips 35 Symposium."},{"key":"e_1_3_4_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2023.3256384"},{"key":"e_1_3_4_43_2","unstructured":"Cerebras. 2024. Introducing Cerebras Inference: AI at Instant Speed. Retrieved from https:\/\/www.cerebras.ai\/blog\/introducing-cerebras-inference-ai-at-instant-speed. Accessed: 5.15.2025."},{"key":"e_1_3_4_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.2981715"},{"key":"e_1_3_4_45_2","unstructured":"GLOBAL SMT&PACKAGING. 2024. A new round of technological innovation in memory market on the way. Retrieved from https:\/\/www.globalsmt.net\/advanced-packaging\/a-new-round-of-technological-innovation-in-memory-market-on-the-way\/. Accessed: 5.15.2025."},{"key":"e_1_3_4_46_2","unstructured":"S. Raschka. 2025. Llama 3.2 From Scratch (A Standalone Notebook). Retrieved from https:\/\/github.com\/rasbt\/LLMs-from-scratch\/blob\/main\/ch05\/07_gpt_to_llama\/standalone-llama32.ipynb. Accessed: 5.15.2025."},{"key":"e_1_3_4_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/IEDM50854.2024.10873508"},{"key":"e_1_3_4_48_2","unstructured":"B. Dally. 2022. Very High Bandwidth Memory \u2013 Avoiding the Memory Wall. in \u201cInsights from NVIDIA Research.\u201d 2022. https:\/\/www.nvidia.com\/en-us\/on-demand\/session\/gtcfall22-a41187\/"},{"key":"e_1_3_4_49_2","unstructured":"S. Legtchenko I. Stefanovici R. Black A. Rowstron J. Liu P. Costa B. Canakci D. Narayanan and X. Wu. 2025. Managed-retention memory: A new class of memory for the AI Era. arXiv preprint arXiv:2501.09605. Retrieved from https:\/\/arxiv.org\/abs\/2501.09605"},{"volume-title":"2024 ACM\/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)","key":"e_1_3_4_50_2","unstructured":"H. Zhang, A. Ning, R. B. Prabhakar, and D. Wentzlaff. 2024. LLMCompass: Enabling efficient hardware design for large language model inference. 2024 ACM\/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). IEEE."},{"volume-title":"Proceedings of the International Symposium on Memory Systems","key":"e_1_3_4_51_2","unstructured":"L. Yang, C. Kao, S. Srikanth, H. E. Sumbul, T. F. Wu, H. Liu, and E. Beigne. 2024. Characterization and design of 3D-stacked memory for image signal processing on AR\/VR devices. Proceedings of the International Symposium on Memory Systems."},{"key":"e_1_3_4_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2023.3338059"},{"volume-title":"Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI","key":"e_1_3_4_53_2","unstructured":"T. Zhang, C. Xu, K. Chen, G. Sun, and Y. Xie. 2014. 3D-SWIFT: A high-performance 3D-stacked wide IO DRAM. Proceedings of the 24th Edition of the Great Lakes Symposium on VLSI."},{"key":"e_1_3_4_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2022.3202254"},{"volume-title":"Proceedings of the 2024 IEEE 74th Electronic Components and Technology Conference","year":"2024","key":"e_1_3_4_55_2","unstructured":"C. S. Mandalapu, C. Buch, P. Shah, R. Topacio, P. Cheng, L. Wang, and R. Swaminathan. 2024. 3.5 D Advanced Packaging Enabling Heterogenous Integration of HPC and AI Accelerators. In Proceedings of the 2024 IEEE 74th Electronic Components and Technology Conference. IEEE, (2024)."},{"key":"e_1_3_4_56_2","unstructured":"J. Cheng and B. V. Durme. 2024. Compressed chain of thought: Efficient reasoning through dense representations. arXiv:2412.13171. Retrieved from https:\/\/arxiv.org\/abs\/2412.13171"},{"key":"e_1_3_4_57_2","first-page":"10088","volume-title":"Advances in Neural Information Processing Systems","volume":"36","year":"2023","unstructured":"T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer. 2023. QLoRA: Efficient finetuning of quantized LLMs. Advances in Neural Information Processing Systems 36 (2023), 10088\u201310115A."},{"key":"e_1_3_4_58_2","unstructured":"Z. Liu C. Zhao I. Fedorov B. Soran D. Choudhary R. Krishnamoorthi V. Chandra Y. Tian and T. Blankevoort. 2024. Spinquant: LLM quantization with learned rotations. arXiv preprint arXiv:2405.16406. Retrieved from https:\/\/arxiv.org\/abs\/2405.16406"},{"key":"e_1_3_4_59_2","unstructured":"Wikipedia. Double data rate. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Double_data_rate. Accessed: 5.15.2025."},{"volume-title":"2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC)","key":"e_1_3_4_60_2","unstructured":"P. Shukla, M. Hajikhodaverdian, V. F. Pavlidis, E. Salman, and A. K. Coskun. 2024. Energy-efficient dataflow design for monolithic 3D systolic arrays with resistive RAM. 2024 IEEE 15th International Green and Sustainable Computing Conference (IGSC). IEEE."},{"volume-title":"Proceedings of the 52nd Annual International Symposium on Computer Architecture","key":"e_1_3_4_61_2","unstructured":"C. Li, Y. Yin, X. Wu, J. Zhu, Z. Gao, D. Niu, Q. Wu, X. Si, Y. Xie, C. Zhang, and G. Sun. 2025. H2-LLM: Hardware-dataflow co-exploration for heterogeneous hybrid-bonding-based low-batch LLM inference. Proceedings of the 52nd Annual International Symposium on Computer Architecture."},{"key":"e_1_3_4_62_2","unstructured":"Qualcomm. 2024. Snapdragon 8 Elite Mobile Platform. Retrieved from https:\/\/www.qualcomm.com\/products\/mobile\/snapdragon\/smartphones\/snapdragon-8-series-mobile-platforms\/snapdragon-8-elite-mobile-platform. Accessed: 5.15.2025."},{"key":"e_1_3_4_63_2","unstructured":"Mediatek. 2024. MediaTek Dimensity 9400. Retrieved from https:\/\/www.mediatek.com\/products\/smartphones\/mediatek-dimensity-9400. Accessed: 5.15.2025."},{"key":"e_1_3_4_64_2","unstructured":"NVIDIA. 2025. NVIDIA Jetson Orin. Retrieved from https:\/\/www.nvidia.com\/en-us\/autonomous-machines\/embedded-systems\/jetson-orin\/. Accessed: 5.15.2025."},{"key":"e_1_3_4_65_2","first-page":"1","volume-title":"ACM Transactions on Design Automation of Electronic Systems","volume":"29","year":"2024","unstructured":"X. Sun, X. Peng, S. Q. Zhang, J. Gomez, W. S. Khwa, S. S. Sarwar, Z. Li, W. Cao, Z. Wang, and C. Liu. 2024. Estimating power, performance, and area for on-sensor deployment of AR\/VR workloads using an analytical framework. ACM Transactions on Design Automation of Electronic Systems 29, 6 (2024), 1\u201327."},{"volume-title":"Proceedings of the 2020 IEEE International Symposium on Performance Analysis of Systems and Software","year":"2020","key":"e_1_3_4_66_2","unstructured":"A. Samajdar, J. M. Joseph, Y. Zhu, P. Whatmough, M. Mattina, and T. Krishna. 2020. A systematic methodology for characterizing scalability of dnn accelerators using scale-sim. In Proceedings of the 2020 IEEE International Symposium on Performance Analysis of Systems and Software. IEEE."},{"key":"e_1_3_4_67_2","first-page":"1054","volume-title":"Proceedings of the 2024 IEEE 74th Electronic Components and Technology Conference","year":"2024","unstructured":"K. Chatterjee, Y. Li, H. Chang, M. Damadam, P. Asrar, J. Kim, G. Jeong, and W. P. Kim. 2024. Thermal and mechanical simulations of 3D packages with custom high bandwidth memory (HBM). In Proceedings of the 2024 IEEE 74th Electronic Components and Technology Conference. Denver, CO, USA, (2024), 1054\u20131059."},{"volume-title":"Autex Research Journal","key":"e_1_3_4_68_2","unstructured":"G. Zhu, D. Kremenakova, Y. Wang, J. Militky, and F. B. Mazari. 2014. An analysis of effective thermal conductivity of heterogeneous materials. Autex Research Journal 14, 1 (2014), 14--21."},{"key":"e_1_3_4_69_2","unstructured":"Digitaltrends. 2025. What is vapor cooling? The fascinating tech keeping your smartphone cool. Retrieved from https:\/\/www.digitaltrends.com\/mobile\/what-is-vapor-chamber-cooling-smartphones-tested-explained\/. Accessed: 5.15.2025."},{"key":"e_1_3_4_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/SEMI-THERM.2015.7100139"},{"key":"e_1_3_4_71_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.est.2023.108219"},{"key":"e_1_3_4_72_2","unstructured":"SIEMENS. 2021. The thermal benefits of vapor chambers. Retrieved from https:\/\/blogs.sw.siemens.com\/simcenter\/the-thermal-benefits-of-vapor-chambers\/. Accessed: 5.15.2025."}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3768168","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T13:50:43Z","timestamp":1762869043000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3768168"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,11]]},"references-count":71,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3768168"],"URL":"https:\/\/doi.org\/10.1145\/3768168","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2025,11,11]]},"assertion":[{"value":"2025-05-23","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}