{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T06:06:07Z","timestamp":1774764367440,"version":"3.50.1"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"7975","license":[{"start":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T00:00:00Z","timestamp":1692748800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T00:00:00Z","timestamp":1692748800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nature"],"published-print":{"date-parts":[[2023,8,24]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Models of artificial intelligence (AI) that have billions of parameters can achieve high accuracy across a range of tasks<jats:sup>1,2<\/jats:sup>, but they exacerbate the poor energy efficiency of conventional general-purpose processors, such as graphics processing units or central processing units. Analog in-memory computing (analog-AI)<jats:sup>3\u20137<\/jats:sup> can provide better energy efficiency by performing matrix\u2013vector multiplications in parallel on \u2018memory tiles\u2019. However, analog-AI has yet to demonstrate software-equivalent (SW<jats:sub>eq<\/jats:sub>) accuracy on models that require many such tiles and efficient communication of neural-network activations between the tiles. Here we present an analog-AI chip that combines 35 million phase-change memory devices across 34 tiles, massively parallel inter-tile communication and analog, low-power peripheral circuitry that can achieve up to 12.4 tera-operations per second per watt (TOPS\/W) chip-sustained performance. We demonstrate fully end-to-end SW<jats:sub>eq<\/jats:sub> accuracy for a small keyword-spotting network and near-SW<jats:sub>eq<\/jats:sub> accuracy on the much larger MLPerf<jats:sup>8<\/jats:sup> recurrent neural-network transducer\u00a0(RNNT), with more than 45 million weights mapped onto more than 140 million phase-change memory devices across five chips.<\/jats:p>","DOI":"10.1038\/s41586-023-06337-5","type":"journal-article","created":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T16:03:56Z","timestamp":1692806636000},"page":"768-775","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":188,"title":["An analog-AI chip for energy-efficient speech recognition and transcription"],"prefix":"10.1038","volume":"620","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5475-4209","authenticated-orcid":false,"given":"S.","family":"Ambrogio","sequence":"first","affiliation":[]},{"given":"P.","family":"Narayanan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5275-5224","authenticated-orcid":false,"given":"A.","family":"Okazaki","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6892-5139","authenticated-orcid":false,"given":"A.","family":"Fasoli","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8413-5583","authenticated-orcid":false,"given":"C.","family":"Mackin","sequence":"additional","affiliation":[]},{"given":"K.","family":"Hosokawa","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2354-867X","authenticated-orcid":false,"given":"A.","family":"Nomura","sequence":"additional","affiliation":[]},{"given":"T.","family":"Yasuda","sequence":"additional","affiliation":[]},{"given":"A.","family":"Chen","sequence":"additional","affiliation":[]},{"given":"A.","family":"Friz","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0794-7232","authenticated-orcid":false,"given":"M.","family":"Ishii","sequence":"additional","affiliation":[]},{"given":"J.","family":"Luquin","sequence":"additional","affiliation":[]},{"given":"Y.","family":"Kohda","sequence":"additional","affiliation":[]},{"given":"N.","family":"Saulnier","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2515-2882","authenticated-orcid":false,"given":"K.","family":"Brew","sequence":"additional","affiliation":[]},{"given":"S.","family":"Choi","sequence":"additional","affiliation":[]},{"given":"I.","family":"Ok","sequence":"additional","affiliation":[]},{"given":"T.","family":"Philip","sequence":"additional","affiliation":[]},{"given":"V.","family":"Chan","sequence":"additional","affiliation":[]},{"given":"C.","family":"Silvestre","sequence":"additional","affiliation":[]},{"given":"I.","family":"Ahsan","sequence":"additional","affiliation":[]},{"given":"V.","family":"Narayanan","sequence":"additional","affiliation":[]},{"given":"H.","family":"Tsai","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5717-2549","authenticated-orcid":false,"given":"G. W.","family":"Burr","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,8,23]]},"reference":[{"key":"6337_CR1","unstructured":"Vaswani, A. et al. Attention is all you need. In NIPS17: Proc. 31st Conference on Neural Information Processing Systems (eds. von Luxburg, U. et al.) 6000\u20136010 (Curran Associates, 2017)."},{"key":"6337_CR2","unstructured":"Chan, W. et al. SpeechStew: simply mix all available speech recognition data to train one large neural network. Preprint at https:\/\/arxiv.org\/abs\/2104.02133 (2021)."},{"key":"6337_CR3","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1038\/s41586-018-0180-5","volume":"558","author":"S Ambrogio","year":"2018","unstructured":"Ambrogio, S. et al. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 558, 60\u201367 (2018).","journal-title":"Nature"},{"key":"6337_CR4","doi-asserted-by":"crossref","unstructured":"Narayanan, P. et al. Fully on-chip MAC at 14 nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format.\u00a0IEEE Trans. Electron. Devices 68, 6629\u20136636 (2021).","DOI":"10.1109\/TED.2021.3115993"},{"key":"6337_CR5","doi-asserted-by":"publisher","first-page":"1027","DOI":"10.1109\/JSSC.2022.3140414","volume":"57","author":"R Khaddam-Aljameh","year":"2022","unstructured":"Khaddam-Aljameh, R. et al. HERMES-core\u2014a 1.59-TOPS\/mm2 PCM on 14-nm CMOS in-memory compute core using 300-ps\/LSB linearized CCO-based ADCs. IEEE J. Solid-State Circuits 57, 1027\u20131038 (2022).","journal-title":"IEEE J. Solid-State Circuits"},{"key":"6337_CR6","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1038\/s41586-020-1942-4","volume":"577","author":"P Yao","year":"2020","unstructured":"Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641\u2013646 (2020).","journal-title":"Nature"},{"key":"6337_CR7","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1038\/s41586-022-04992-8","volume":"608","author":"W Wan","year":"2022","unstructured":"Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608, 504\u2013512 (2022).","journal-title":"Nature"},{"key":"6337_CR8","unstructured":"Better Machine Learning for Everyone. ML Commons https:\/\/mlcommons.org (2023)."},{"key":"6337_CR9","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436\u2013444 (2015).","journal-title":"Nature"},{"key":"6337_CR10","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1109\/TASL.2011.2134090","volume":"20","author":"GE Dahl","year":"2011","unstructured":"Dahl, G. E., Yu, D., Deng, L. & Acero, A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30\u201342 (2011).","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"6337_CR11","doi-asserted-by":"crossref","unstructured":"Graves, A., Fern\u00e1ndez, S., Gomez, F. & Schmidhuber, J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML \u201906: Proc. 23rd International Conference on Machine Learning (eds Cohen, W. & Moore, A.) 369\u2013376 (ACM, 2006).","DOI":"10.1145\/1143844.1143891"},{"key":"6337_CR12","doi-asserted-by":"crossref","unstructured":"Graves, A. Sequence transduction with recurrent neural networks. Preprint at https:\/\/arxiv.org\/abs\/1211.3711 (2012).","DOI":"10.1007\/978-3-642-24797-2_3"},{"key":"6337_CR13","doi-asserted-by":"crossref","unstructured":"Graves, A., Mohamed, A.-R. & Hinton, G. Speech recognition with deep recurrent neural networks. In Proc. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing 6645\u20136649 (IEEE, 2013) .","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"6337_CR14","unstructured":"Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. Preprint at https:\/\/arxiv.org\/abs\/1409.0473 (2014)."},{"key":"6337_CR15","doi-asserted-by":"publisher","first-page":"3451","DOI":"10.1109\/TASLP.2021.3122291","volume":"29","author":"W-N Hsu","year":"2021","unstructured":"Hsu, W.-N. et al. HuBERT: self-supervised speech representation learning by masked prediction of hidden units. IEEE\/ACM Trans. Audio Speech Lang. Process. 29, 3451\u20133460 (2021).","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"6337_CR16","doi-asserted-by":"crossref","unstructured":"Gulati, A. et al. Conformer: convolution-augmented transformer for speech recognition. Preprint at https:\/\/arxiv.org\/abs\/2005.08100 (2020).","DOI":"10.21437\/Interspeech.2020-3015"},{"key":"6337_CR17","doi-asserted-by":"crossref","unstructured":"Panayotov, V., Chen, G., Povey, D. & Khudanpur, S. Librispeech: an ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5206\u20135210 (IEEE, 2015).","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"6337_CR18","doi-asserted-by":"crossref","unstructured":"Godfrey, J., Holliman, E. & McDaniel, J. SWITCHBOARD: telephone speech corpus for research and development. In ICASSP-92: Proc. International Conference on Acoustics, Speech and Signal Processing 517\u2013520 (IEEE, 1992).","DOI":"10.1109\/ICASSP.1992.225858"},{"key":"6337_CR19","unstructured":"Gholami, A., Yao, Z., Kim, S., Mahoney, M. W. & Keutzer, K. AI and memory wall. RiseLab Medium https:\/\/medium.com\/riselab\/ai-and-memory-wall-2cb4265cb0b8 (2021)."},{"key":"6337_CR20","doi-asserted-by":"crossref","unstructured":"Jain, S. et al. A heterogeneous and programmable compute-in-memory accelerator architecture for analog-AI using dense 2-D mesh.\u00a0IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 31, 114\u2013127 (2023).","DOI":"10.1109\/TVLSI.2022.3221390"},{"key":"6337_CR21","doi-asserted-by":"crossref","unstructured":"Chen, G., Parada, C. & Heigold, G. Small-footprint keyword spotting using deep neural networks. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4087\u20134091 (2014).","DOI":"10.1109\/ICASSP.2014.6854370"},{"key":"6337_CR22","unstructured":"Zhang, Y., Suda, N., Lai, L. & Chandra, V. Hello edge: keyword spotting on microcontrollers. Preprint at https:\/\/arxiv.org\/abs\/1711.07128 (2018)."},{"key":"6337_CR23","doi-asserted-by":"crossref","unstructured":"Gokmen, T., Rasch, M. J. & Haensch, W. The marriage of training and inference for scaled deep learning analog hardware. In 2019 IEEE International Electron Devices Meeting (IEDM) 22.3.1\u201322.3.4 (2019).","DOI":"10.1109\/IEDM19573.2019.8993573"},{"key":"6337_CR24","doi-asserted-by":"publisher","first-page":"675741","DOI":"10.3389\/fncom.2021.675741","volume":"15","author":"K Spoon","year":"2021","unstructured":"Spoon, K. et al. Toward software-equivalent accuracy on transformer-based deep neural networks with analog memory devices. Front. Comput. Neurosci. 15, 675741 (2021).","journal-title":"Front. Comput. Neurosci."},{"key":"6337_CR25","doi-asserted-by":"publisher","first-page":"4356","DOI":"10.1109\/TED.2021.3089987","volume":"68","author":"S Kariyappa","year":"2021","unstructured":"Kariyappa, S. et al. Noise-resilient DNN: tolerating noise in PCM-based AI accelerators via noise-aware training. IEEE Trans. Electron Devices 68, 4356\u20134362 (2021).","journal-title":"IEEE Trans. Electron Devices"},{"key":"6337_CR26","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-020-16108-9","volume":"11","author":"V Joshi","year":"2020","unstructured":"Joshi, V. et al. Accurate deep neural network inference using computational phase-change memory. Nat. Commun. 11, 2473 (2020).","journal-title":"Nat. Commun."},{"key":"6337_CR27","doi-asserted-by":"crossref","unstructured":"Macoskey, J., Strimel, G. P., Su, J. & Rastrow, A. Amortized neural networks for low-latency speech recognition. Preprint at https:\/\/arxiv.org\/abs\/2108.01553 (2021).","DOI":"10.21437\/Interspeech.2021-712"},{"key":"6337_CR28","doi-asserted-by":"crossref","unstructured":"Fasoli, A. et al. Accelerating inference and language model fusion of recurrent neural network transducers via end-to-end 4-bit quantization. In Proc. Interspeech 2022 2038\u20132042 (2022).","DOI":"10.21437\/Interspeech.2022-413"},{"key":"6337_CR29","doi-asserted-by":"crossref","unstructured":"Ding, S. et al. 4-bit conformer with native quantization aware training for speech recognition. Proc. Interspeech 2022 1711\u20131715 (2022).","DOI":"10.21437\/Interspeech.2022-10809"},{"key":"6337_CR30","first-page":"1796","volume":"33","author":"X Sun","year":"2020","unstructured":"Sun, X. et al. Ultra-low precision 4-bit training of deep neural networks. Adv. Neural Inf. Process. Syst. 33, 1796\u20131807 (2020).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"6337_CR31","doi-asserted-by":"publisher","first-page":"1078","DOI":"10.1109\/TED.2009.2016398","volume":"56","author":"S Lavizzari","year":"2009","unstructured":"Lavizzari, S., Ielmini, D., Sharma, D. & Lacaita, A. L. Reliability impact of chalcogenide-structure relaxation in phase-change memory (PCM) cells\u2014part II: physics-based modeling. IEEE Trans. Electron Devices 56, 1078\u20131085 (2009).","journal-title":"IEEE Trans. Electron Devices"},{"key":"6337_CR32","doi-asserted-by":"crossref","unstructured":"Biswas, A. & Chandrakasan, A. P. Conv-RAM: an energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications. In Proc. 2018 IEEE International Solid-State Circuits Conference (ISSCC) 488\u2013490 (IEEE, 2018).","DOI":"10.1109\/ISSCC.2018.8310397"},{"key":"6337_CR33","doi-asserted-by":"publisher","first-page":"8:1","DOI":"10.1147\/JRD.2019.2934050","volume":"63","author":"H-Y Chang","year":"2019","unstructured":"Chang, H.-Y. et al. AI hardware acceleration with analog memory: microarchitectures for low energy at high speed. IBM J. Res. Dev. 63, 8:1\u20138:14 (2019).","journal-title":"IBM J. Res. Dev."},{"key":"6337_CR34","doi-asserted-by":"crossref","unstructured":"Jiang, H., Li, W., Huang, S. & Yu, S. A 40nm analog-input ADC-free compute-in-memory RRAM macro with pulse-width modulation between sub-arrays. In 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits) 266\u2013267 (IEEE, 2022).","DOI":"10.1109\/VLSITechnologyandCir46769.2022.9830211"},{"key":"6337_CR35","doi-asserted-by":"crossref","unstructured":"Jia, H. et al. A programmable neural-network inference accelerator based on scalable in-memory computing. In 2021 IEEE International Solid-State Circuits Conference (ISSCC) 236\u2013238 (IEEE, 2021).","DOI":"10.1109\/ISSCC42613.2021.9365788"},{"key":"6337_CR36","doi-asserted-by":"crossref","unstructured":"Dong, Q. et al. A 351TOPS\/W and 372.4GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications. In 2020 IEEE International Solid-State Circuits Conference (ISSCC) 242\u2013244 (IEEE, 2020).","DOI":"10.1109\/ISSCC19947.2020.9062985"},{"key":"6337_CR37","doi-asserted-by":"crossref","unstructured":"Chih, Y.-D. et al. An 89TOPS\/W and 16.3TOPS\/mm2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications. In 2021 IEEE International Solid-State Circuits Conference (ISSCC) 252\u2013254 (IEEE, 2021).","DOI":"10.1109\/ISSCC42613.2021.9365766"},{"key":"6337_CR38","doi-asserted-by":"crossref","unstructured":"Su, J.-W. et al. A 28nm 384kb 6T-SRAM computation-in-memory macro with 8b precision for AI edge chips. In 2021 IEEE International Solid-State Circuits Conference (ISSCC) 250\u2013252 (IEEE, 2021).","DOI":"10.1109\/ISSCC42613.2021.9365984"},{"key":"6337_CR39","doi-asserted-by":"crossref","unstructured":"Yoon, J.-H. et al. A 40nm 64Kb 56.67TOPS\/W read-disturb-tolerant compute-in-memory\/digital RRAM macro with active-feedback-based read and in-situ write verification. In 2021 IEEE International Solid-State Circuits Conference (ISSCC) 404\u2013406 (IEEE, 2021).","DOI":"10.1109\/ISSCC42613.2021.9365926"},{"key":"6337_CR40","doi-asserted-by":"crossref","unstructured":"Xue, C.-X. et al. A 22nm 4Mb 8b-precision ReRAM computing-in-memory macro with 11.91 to 195.7TOPS\/w for tiny AI edge devices. In 2021 IEEE International Solid- State Circuits Conference (ISSCC) 245\u2013247 (IEEE, 2021).","DOI":"10.1109\/ISSCC42613.2021.9365769"},{"key":"6337_CR41","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1109\/JETCAS.2018.2796379","volume":"8","author":"MJ Marinella","year":"2018","unstructured":"Marinella, M. J. et al. Multiscale co-design analysis of energy, latency, area, and accuracy of a ReRAM analog neural training accelerator. IEEE J. Emerg. Select. Topics Circuits Syst. 8, 86\u2013101 (2018).","journal-title":"IEEE J. Emerg. Select. Topics Circuits Syst."}],"container-title":["Nature"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41586-023-06337-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41586-023-06337-5","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41586-023-06337-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T06:06:22Z","timestamp":1692857182000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41586-023-06337-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,23]]},"references-count":41,"journal-issue":{"issue":"7975","published-print":{"date-parts":[[2023,8,24]]}},"alternative-id":["6337"],"URL":"https:\/\/doi.org\/10.1038\/s41586-023-06337-5","relation":{},"ISSN":["0028-0836","1476-4687"],"issn-type":[{"value":"0028-0836","type":"print"},{"value":"1476-4687","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,23]]},"assertion":[{"value":"13 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 June 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 August 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}