{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T09:05:03Z","timestamp":1780650303499,"version":"3.54.1"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"7","license":[{"start":{"date-parts":[[2020,6,18]],"date-time":"2020-06-18T00:00:00Z","timestamp":1592438400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2020,6,18]]},"abstract":"<jats:p>Google's TPU supercomputers train deep neural networks 50x faster than general-purpose supercomputers running a high-performance computing benchmark.<\/jats:p>","DOI":"10.1145\/3360307","type":"journal-article","created":{"date-parts":[[2020,6,18]],"date-time":"2020-06-18T20:23:33Z","timestamp":1592511813000},"page":"67-78","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":222,"title":["A domain-specific supercomputer for training deep neural networks"],"prefix":"10.1145","volume":"63","author":[{"given":"Norman P.","family":"Jouppi","sequence":"first","affiliation":[{"name":"Google, Mountain View, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Doe Hyun","family":"Yoon","sequence":"additional","affiliation":[{"name":"Google, Mountain View, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"George","family":"Kurian","sequence":"additional","affiliation":[{"name":"Google, Mountain View, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sheng","family":"Li","sequence":"additional","affiliation":[{"name":"Google, Mountain View, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nishant","family":"Patil","sequence":"additional","affiliation":[{"name":"Google, Mountain View, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"James","family":"Laudon","sequence":"additional","affiliation":[{"name":"Google, Mountain View, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Cliff","family":"Young","sequence":"additional","affiliation":[{"name":"Google, Mountain View, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"David","family":"Patterson","sequence":"additional","affiliation":[{"name":"Google, Mountain View, CA and University of California, Berkeley, CA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,6,18]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Abadi M. et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. 2016; arXiv preprint arXiv:1603.04467.  Abadi M. et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. 2016; arXiv preprint arXiv:1603.04467."},{"key":"e_1_2_1_2_1","volume-title":"AI and compute","author":"Amodei D.","year":"2018","unstructured":"Amodei , D. and Hernandez , D . AI and compute , 2018 ; https:\/\/blog.openai.com\/aiandcompute. Amodei, D. and Hernandez, D. AI and compute, 2018; https:\/\/blog.openai.com\/aiandcompute."},{"key":"e_1_2_1_3_1","volume-title":"Programmable neurocomputing. The Handbook of Brain Theory and Neural Networks","author":"Asanovi\u0107 K.","year":"2002","unstructured":"Asanovi\u0107 , K. Programmable neurocomputing. The Handbook of Brain Theory and Neural Networks , 2 nd Edition, M.A. Arbib, ed. MIT Press , 2002 . Asanovi\u0107, K. Programmable neurocomputing. The Handbook of Brain Theory and Neural Networks, 2nd Edition, M.A. Arbib, ed. MIT Press, 2002.","edition":"2"},{"key":"e_1_2_1_4_1","unstructured":"Bahdanau D. Cho K. and Bengio Y. Neural machine translation by jointly learning to align and translate. 2014; arXiv preprint arXiv:1409.0473.  Bahdanau D. Cho K. and Bengio Y. Neural machine translation by jointly learning to align and translate. 2014; arXiv preprint arXiv:1409.0473."},{"key":"e_1_2_1_5_1","unstructured":"Chen J. et al. Revisiting distributed synchronous SGD. 2016; arXiv preprint arXiv:1604.00981.  Chen J. et al. Revisiting distributed synchronous SGD. 2016; arXiv preprint arXiv:1604.00981."},{"key":"e_1_2_1_6_1","volume-title":"et al. The best of both worlds: Combining recent advances in neural machine translation. 2018","author":"Chen M.X.","year":"1804","unstructured":"Chen , M.X. et al. The best of both worlds: Combining recent advances in neural machine translation. 2018 ; arXiv preprint arXiv: 1804 .09849. Chen, M.X. et al. The best of both worlds: Combining recent advances in neural machine translation. 2018; arXiv preprint arXiv:1804.09849."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 47th Int'l Symp. on Microarchitecture","author":"Chen Y.","year":"2014","unstructured":"Chen , Y. et al. Dadiannao: A machine-learning supercomputer . In Proceedings of the 47th Int'l Symp. on Microarchitecture , ( 2014 ), 609--622. Chen, Y. et al. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Int'l Symp. on Microarchitecture, (2014), 609--622."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the IEEE Int'l Conference on Acoustics, Speech and Signal Processing, (Apr.","author":"Chiu C.C.","year":"2018","unstructured":"Chiu , C.C. et al. State-of-the-art speech recognition with sequence-to-sequence models . In Proceedings of the IEEE Int'l Conference on Acoustics, Speech and Signal Processing, (Apr. 2018 ), 4774--4778. Chiu, C.C. et al. State-of-the-art speech recognition with sequence-to-sequence models. In Proceedings of the IEEE Int'l Conference on Acoustics, Speech and Signal Processing, (Apr. 2018), 4774--4778."},{"key":"e_1_2_1_9_1","volume-title":"Bloomberg Technology","author":"Clark J.","year":"2015","unstructured":"Clark , J. Google turning its lucrative Web search over to AI machines . Bloomberg Technology , Oct. 26, 2015 . Clark, J. Google turning its lucrative Web search over to AI machines. Bloomberg Technology, Oct. 26, 2015."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080248"},{"key":"e_1_2_1_11_1","volume-title":"et al. High-accuracy low-precision training. 2018","author":"De Sa C.","year":"1803","unstructured":"De Sa , C. et al. High-accuracy low-precision training. 2018 ; arXiv preprint arXiv: 1803 .03383. De Sa, C. et al. High-accuracy low-precision training. 2018; arXiv preprint arXiv:1803.03383."},{"key":"e_1_2_1_12_1","volume-title":"Advances in Neural Information Processing Systems","author":"Dean J.","year":"2012","unstructured":"Dean , J. et al. Large scale distributed deep networks . Advances in Neural Information Processing Systems , ( 2012 ), 1223--1231. Dean, J. et al. Large scale distributed deep networks. Advances in Neural Information Processing Systems, (2012), 1223--1231."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the SPEC Benchmark Workshop, (Jan. 2007)","author":"Dongarra J.","year":"2007","unstructured":"Dongarra , J. The HPC challenge benchmark: a candidate for replacing Linpack in the Top500 ? In Proceedings of the SPEC Benchmark Workshop, (Jan. 2007) ; www.spec.org\/workshops\/ 2007 \/austin\/slides\/Keynote_Jack_Dongarra.pdf. Dongarra, J. The HPC challenge benchmark: a candidate for replacing Linpack in the Top500? In Proceedings of the SPEC Benchmark Workshop, (Jan. 2007); www.spec.org\/workshops\/2007\/austin\/slides\/Keynote_Jack_Dongarra.pdf."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2021068"},{"key":"e_1_2_1_15_1","unstructured":"Graphcore Intelligence Processing Unit. (https:\/\/www.graphcore.ai\/products\/ipu  Graphcore Intelligence Processing Unit. (https:\/\/www.graphcore.ai\/products\/ipu"},{"key":"e_1_2_1_16_1","volume-title":"Computer Architecture: A Quantitative Approach","author":"Hennessy J.L.","year":"2019","unstructured":"Hennessy , J.L. and Patterson , D.A . Computer Architecture: A Quantitative Approach , 6 th Edition. Elsevier , 2019 . Hennessy, J.L. and Patterson, D.A. Computer Architecture: A Quantitative Approach, 6th Edition. Elsevier, 2019.","edition":"6"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3282307"},{"key":"e_1_2_1_18_1","unstructured":"Ioffe S. and Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2015; arXiv preprint arXiv:1502.03167.  Ioffe S. and Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 2015; arXiv preprint arXiv:1502.03167."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 44th Int'l Symp. on Computer Architecture, (June","author":"Jouppi N.P.","year":"2017","unstructured":"Jouppi , N.P. et al. In-datacenter performance analysis of a tensor processing unit . In Proceedings of the 44th Int'l Symp. on Computer Architecture, (June 2017 ), 1--12. Jouppi, N.P. et al. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Int'l Symp. on Computer Architecture, (June 2017), 1--12."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3154484"},{"key":"e_1_2_1_21_1","volume-title":"et al. A study of Bfloat16 for deep learning training. 2019","author":"Kalamkar D.","year":"1905","unstructured":"Kalamkar , D. et al. A study of Bfloat16 for deep learning training. 2019 ; arXiv preprint arXiv: 1905 .12322. Kalamkar, D. et al. A study of Bfloat16 for deep learning training. 2019; arXiv preprint arXiv:1905.12322."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 31st Conf. on Neural Information Processing Systems","author":"K\u00f6ster U.","year":"2017","unstructured":"K\u00f6ster , U. et al. Flexpoint: An adaptive numerical format for efficient training of deep neural networks . In Proceedings of the 31st Conf. on Neural Information Processing Systems , ( 2017 ). K\u00f6ster, U. et al. Flexpoint: An adaptive numerical format for efficient training of deep neural networks. In Proceedings of the 31st Conf. on Neural Information Processing Systems, (2017)."},{"key":"e_1_2_1_23_1","volume-title":"Algorithms for VLSI processor arrays. Introduction to VLSI Systems","author":"Kung H.T.","year":"1980","unstructured":"Kung , H.T. and Leiserson , C.E . Algorithms for VLSI processor arrays. Introduction to VLSI Systems , 1980 . Kung, H.T. and Leiserson, C.E. Algorithms for VLSI processor arrays. Introduction to VLSI Systems, 1980."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the IEEE Hot Chips 31 Symp., (Aug","author":"Lie S.","year":"2019","unstructured":"Lie , S. Wafer scale deep learning . In Proceedings of the IEEE Hot Chips 31 Symp., (Aug 2019 ). Lie, S. Wafer scale deep learning. In Proceedings of the IEEE Hot Chips 31 Symp., (Aug 2019)."},{"key":"e_1_2_1_25_1","volume-title":"et al. Mixed precision training with 8-bit floating point. 2019","author":"Mellempudi N.","year":"1905","unstructured":"Mellempudi , N. et al. Mixed precision training with 8-bit floating point. 2019 ; arXiv preprint arXiv: 1905 .12334. Mellempudi, N. et al. Mixed precision training with 8-bit floating point. 2019; arXiv preprint arXiv:1905.12334."},{"key":"e_1_2_1_26_1","unstructured":"Micikevicius P. et al. Mixed precision training. 2017; arXiv preprint arXiv:1710.03740.  Micikevicius P. et al. Mixed precision training. 2017; arXiv preprint arXiv:1710.03740."},{"key":"e_1_2_1_27_1","volume-title":"et al. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems","author":"Mikolov T.","year":"2013","unstructured":"Mikolov , T. et al. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems ( 2013 ), 3111--3119. Mikolov, T. et al. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (2013), 3111--3119."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the IEEE Hot Chips 29 Symp., (Aug 2017I.","author":"Nicol C.","unstructured":"Nicol , C. A dataflow processing chip for training deep neural networks . In Proceedings of the IEEE Hot Chips 29 Symp., (Aug 2017I. Nicol, C. A dataflow processing chip for training deep neural networks. In Proceedings of the IEEE Hot Chips 29 Symp., (Aug 2017I."},{"key":"e_1_2_1_29_1","volume-title":"NLP, and representations. Colah's blog","author":"Olah C.","year":"2014","unstructured":"Olah , C. Deep learning , NLP, and representations. Colah's blog , 2014 ; http:\/\/colah.github.io\/posts\/2014-07-NLP-RNNs-Representations\/. Olah, C. Deep learning, NLP, and representations. Colah's blog, 2014; http:\/\/colah.github.io\/posts\/2014-07-NLP-RNNs-Representations\/."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/0041-5553(64)90137-5"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177729586"},{"key":"e_1_2_1_32_1","volume-title":"et al. Measuring the effects of data parallelism on neural network training. 2018","author":"Shallue C.J.","year":"1811","unstructured":"Shallue , C.J. et al. Measuring the effects of data parallelism on neural network training. 2018 ; arXiv preprint arXiv: 1811 .03600. Shallue, C.J. et al. Measuring the effects of data parallelism on neural network training. 2018; arXiv preprint arXiv:1811.03600."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1364782.1364802"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.aar6404"},{"key":"e_1_2_1_35_1","volume-title":"Why the GPGPU is less efficient than the TPU for DNNs. Computer Architecture Today Blog","author":"Thottethodi M.","year":"2019","unstructured":"Thottethodi , M. and Vijaykumar , T . Why the GPGPU is less efficient than the TPU for DNNs. Computer Architecture Today Blog , 2019 ; www.sigarch.org\/why-the-gpgpu-is-less-efficientthan-the-tpu-for-dnns\/ Thottethodi, M. and Vijaykumar, T. Why the GPGPU is less efficient than the TPU for DNNs. Computer Architecture Today Blog, 2019; www.sigarch.org\/why-the-gpgpu-is-less-efficientthan-the-tpu-for-dnns\/"},{"key":"e_1_2_1_36_1","volume-title":"et al. Attention is all you need. Advances in Neural Information Processing Systems","author":"Vaswani A.","year":"2017","unstructured":"Vaswani , A. et al. Attention is all you need. Advances in Neural Information Processing Systems ( 2017 ), 5998--6008. Vaswani, A. et al. Attention is all you need. Advances in Neural Information Processing Systems (2017), 5998--6008."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 45th Int'l Symp. on Computer Architecture","author":"Venkataramani S.","year":"2017","unstructured":"Venkataramani , S. et al. Scaledeep: A scalable compute architecture for learning and evaluating deep networks . In Proceedings of the 45th Int'l Symp. on Computer Architecture , ( 2017 ), 13--26. Venkataramani, S. et al. Scaledeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 45th Int'l Symp. on Computer Architecture, (2017), 13--26."},{"key":"e_1_2_1_38_1","series-title":"June 2019","volume-title":"Habana debuts record-breaking AI training chip,","author":"Ward-Foxton S.","unstructured":"Ward-Foxton , S. Habana debuts record-breaking AI training chip, ( June 2019 ); https:\/\/www.eetimes.com\/document.asp?doc_id=1334816. Ward-Foxton, S. Habana debuts record-breaking AI training chip, (June 2019); https:\/\/www.eetimes.com\/document.asp?doc_id=1334816."},{"key":"e_1_2_1_39_1","volume-title":"Rounding Errors in Algebraic Processes","author":"Wilkinson J.H.","year":"1963","unstructured":"Wilkinson , J.H. Rounding Errors in Algebraic Processes , 1 st Edition. Prentice Hall , Englewood Cliffs, NJ , 1963 . Wilkinson, J.H. Rounding Errors in Algebraic Processes, 1st Edition. Prentice Hall, Englewood Cliffs, NJ, 1963.","edition":"1"},{"key":"e_1_2_1_40_1","series-title":"Aug. 2019","volume-title":"Proceedings of the Hot Chips,","author":"Yang A.","unstructured":"Yang , A. Deep learning training at scale Spring Crest Deep Learning Accelerator (Intel\u00ae Nervana\u2122 NNP-T) . In Proceedings of the Hot Chips, ( Aug. 2019 ); www.hotchips.org\/hc31\/HC31_1.12_Intel_Intel.AndrewYang.v0.92.pdf. Yang, A. Deep learning training at scale Spring Crest Deep Learning Accelerator (Intel\u00ae Nervana\u2122 NNP-T). In Proceedings of the Hot Chips, (Aug. 2019); www.hotchips.org\/hc31\/HC31_1.12_Intel_Intel.AndrewYang.v0.92.pdf."},{"key":"e_1_2_1_41_1","volume-title":"et al. Image classification at supercomputer scale. 2018","author":"Ying C.","year":"1811","unstructured":"Ying , C. et al. Image classification at supercomputer scale. 2018 ; arXiv preprint arXiv: 1811 .06992. Ying, C. et al. Image classification at supercomputer scale. 2018; arXiv preprint arXiv:1811.06992."},{"key":"e_1_2_1_42_1","unstructured":"Zoph B. and Le Q.V. Neural architecture search with reinforcement learning. 2019; arXiv preprint arXiv:1611.01578.  Zoph B. and Le Q.V. Neural architecture search with reinforcement learning. 2019; arXiv preprint arXiv:1611.01578."}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3360307","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3360307","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:29Z","timestamp":1750203869000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3360307"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,18]]},"references-count":42,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2020,6,18]]}},"alternative-id":["10.1145\/3360307"],"URL":"https:\/\/doi.org\/10.1145\/3360307","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"value":"0001-0782","type":"print"},{"value":"1557-7317","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,18]]},"assertion":[{"value":"2020-06-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}