{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T13:48:04Z","timestamp":1772632084902,"version":"3.50.1"},"reference-count":9,"publisher":"IEEE","license":[{"start":{"date-parts":[[2026,2,15]],"date-time":"2026-02-15T00:00:00Z","timestamp":1771113600000},"content-version":"stm-asf","delay-in-days":0,"URL":"https:\/\/doi.org\/10.15223\/policy-029"},{"start":{"date-parts":[[2026,2,15]],"date-time":"2026-02-15T00:00:00Z","timestamp":1771113600000},"content-version":"stm-asf","delay-in-days":0,"URL":"https:\/\/doi.org\/10.15223\/policy-037"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,2,15]]},"DOI":"10.1109\/isscc49663.2026.11409030","type":"proceedings-article","created":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T20:50:24Z","timestamp":1772571024000},"page":"42-44","source":"Crossref","is-referenced-by-count":0,"title":["2.1 AMD Instinct MI350 Series GPUs: CDNA 4-Based 3D-Stacked 3nm XCDs and 6nm IODs for AI Applications"],"prefix":"10.1109","author":[{"given":"Ramasamy","family":"Adaikkalavan","sequence":"first","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Alan","family":"Smith","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Teja","family":"Singh","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Sundar","family":"Rangarajan","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Eric","family":"Chapman","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Samuel","family":"Naffziger","sequence":"additional","affiliation":[{"name":"AMD,Fort Collins,CO"}]},{"given":"Subramaniam","family":"Maiyuran","sequence":"additional","affiliation":[{"name":"AMD,Folsom,CA"}]},{"given":"Candy","family":"Chen","sequence":"additional","affiliation":[{"name":"AMD,Shanghai,China"}]},{"given":"Sriram","family":"Sundaram","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Mark","family":"Silla","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Duncan","family":"Law","sequence":"additional","affiliation":[{"name":"AMD,Markham,Canada"}]},{"given":"Kathy","family":"Hoover","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Samuel","family":"Lipson","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Kevin","family":"Duda","sequence":"additional","affiliation":[{"name":"AMD,Fort Collins,CO"}]},{"given":"Vinay","family":"Parthasarathy","sequence":"additional","affiliation":[{"name":"AMD,San Diego,CA"}]},{"given":"Deepesh","family":"John","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Hanish","family":"Vemulapalli","sequence":"additional","affiliation":[{"name":"AMD,Austin,TX"}]},{"given":"Srinivas Pavan Kumar","family":"Gade","sequence":"additional","affiliation":[{"name":"AMD,Hyderabad,India"}]}],"member":"263","reference":[{"key":"ref1","doi-asserted-by":"publisher","DOI":"10.1109\/iedm.2017.8268306"},{"key":"ref2","first-page":"44","article-title":"Zen 5: The AMD High-Performance 4nm \u00d7 86\u201364 Microprocessor Core","author":"Singh","year":"2025","journal-title":"ISSCC"},{"key":"ref3","volume-title":"MI350\u2013005: Based on calculations by AMD Performance Labs in May 2025 for the AMD Instinct\u2122 MI355X and MI350X GPUs to determine the peak theoretical precision performance when comparing FP16, FP8, FP6 and FP4 datatypes with Matrix vs. AMD Instinct MI325X, MI300X, MI250X and MI100 GPUs. Server manufacturers may vary configurations, yielding different results"},{"key":"ref4","volume-title":"MI350\u2013036 (n.d.). Based on testing by AMD Performance Labs in June 2025, on the AMD Instinct MI350X vs. AMD Instinct MI300X GPUs, using an AMD proprietary synthetic micro-benchmark to determine the memory read bandwidth for each GPU, and an AMD internal tool to collect per watt performance in a time-series format at the 1 ms timescale, for each GPU. Server manufacturers may vary configurations, yielding different results. Results may vary based on use of the latest drivers and optimizations"},{"key":"ref5","volume-title":"MI350\u2013042 (n.d.). Based on AMD measurements as of 6\/62025 of the text-generated offline inference throughput for Llama 3.1-405B chat model on an 8x GPU AMD Instinct MI355X platform running 8x TP1 (8 copies of model on 1 GPU) with (FP4) compared to an 8 \u00d7 GPU AMD Instinct MI300X platform running 2 \u00d7 TP4 (2 copies of model on 4 GPUs) with (FP8). Tests were conducted using a synthetic dataset with different combinations of 128 and 2048 input\/output tokens. Server manufacturers may vary configurations, yielding different results. Performance may vary based on the use of the latest drivers and optimizations"},{"key":"ref6","doi-asserted-by":"publisher","DOI":"10.1109\/hcs61935.2024.10664659"},{"key":"ref7","volume-title":"MI350\u2013038: Based on testing by AMD internal labs as of 6\/6\/2025 measuring text generated throughput for LLaMA 3.1-405B model using FP4 datatype. Test was performed using input length of 128 tokens and an output length of 2048 tokens for AMD Instinct\u2122 MI355\u00d78\u00d7GPU platform compared to NVIDIA B200 HGX 8\u00d7GPU platform published results. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations"},{"key":"ref8","volume-title":"MI350\u2013039: Based on Lucid automation framework testing by AMD labs as of 6\/6\/2025, measuring text generated throughput for LLaMA 3.1-405B model using FP4 datatype. Test was performed using 4 different combinations of input\/output lengths (128\/2048) to achieve a mean score of tokens per second for AMD Instinct\u2122 MI355X 4\u00d7GPU platform compared to NVIDIA DGX GB200 4\u00d7GPU platform. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations"},{"key":"ref9","volume-title":"MI350\u2013040: Based on testing (tokens per second) by AMD internal labs as of 6\/6\/2025 measuring text generated online serving throughput for DeepSeek-R1 chat model using FP4 datatype. Test was performed using input length of 3200 tokens and an output length of 800 tokens with concurrency up to 64 looks, serviceable with 30 ms ITL threshold for AMD Instinct\u2122 MI355X8xGPU platform median total tokens compared to NVIDIA B200 HGX 8xGPU platform results. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations"}],"event":{"name":"2026 IEEE International Solid-State Circuits Conference (ISSCC)","location":"San Francisco, CA, USA","start":{"date-parts":[[2026,2,15]]},"end":{"date-parts":[[2026,2,19]]}},"container-title":["2026 IEEE International Solid-State Circuits Conference (ISSCC)"],"original-title":[],"link":[{"URL":"http:\/\/xplorestaging.ieee.org\/ielx8\/11408863\/11408946\/11409030.pdf?arnumber=11409030","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T06:50:30Z","timestamp":1772607030000},"score":1,"resource":{"primary":{"URL":"https:\/\/ieeexplore.ieee.org\/document\/11409030\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,15]]},"references-count":9,"URL":"https:\/\/doi.org\/10.1109\/isscc49663.2026.11409030","relation":{},"subject":[],"published":{"date-parts":[[2026,2,15]]}}}