{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,27]],"date-time":"2026-06-27T15:46:46Z","timestamp":1782575206444,"version":"3.54.5"},"publisher-location":"New York, NY, USA","reference-count":6,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,8,20]],"date-time":"2020-08-20T00:00:00Z","timestamp":1597881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,8,23]]},"DOI":"10.1145\/3394486.3406703","type":"proceedings-article","created":{"date-parts":[[2020,8,20]],"date-time":"2020-08-20T23:03:55Z","timestamp":1597964635000},"page":"3505-3506","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":692,"title":["DeepSpeed"],"prefix":"10.1145","author":[{"given":"Jeff","family":"Rasley","sequence":"first","affiliation":[{"name":"Microsoft, Bellevue, WA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Samyam","family":"Rajbhandari","sequence":"additional","affiliation":[{"name":"Microsoft, Bellevue, WA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Olatunji","family":"Ruwase","sequence":"additional","affiliation":[{"name":"Microsoft, Bellevue, WA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuxiong","family":"He","sequence":"additional","affiliation":[{"name":"Microsoft, Bellevue, WA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,8,20]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"8024","volume":"32","author":"Paszke Adam","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . Pytorch: An imperative style, high-performance deep learning library. In H. Wallach , H. Larochelle , A. Beygelzimer , F. d' Alch\u00e9-Buc , E. Fox , and R. Garnett , editors, Advances in Neural Information Processing Systems 32 , pages 8024 -- 8035 . Curran Associates, Inc., 2019. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024--8035. Curran Associates, Inc., 2019.","journal-title":"editors, Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_2_1","volume-title":"Mixed precision training","author":"Micikevicius Paulius","year":"2017","unstructured":"Paulius Micikevicius , Sharan Narang , Jonah Alben , Gregory Diamos , Erich Elsen , David Garcia , Boris Ginsburg , Michael Houston , Oleksii Kuchaiev , Ganesh Venkatesh , and Hao Wu . Mixed precision training , 2017 . Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed precision training, 2017."},{"key":"e_1_3_2_1_3_1","volume-title":"http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf","author":"NVIDIA","year":"2017","unstructured":"NVIDIA Tesla V100 GPU architecture. http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf , 2017 . [Online, accessed 22-April-2020]. NVIDIA Tesla V100 GPU architecture. http:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf, 2017. [Online, accessed 22-April-2020]."},{"key":"e_1_3_2_1_4_1","volume-title":"ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arXiv:1910.02054","author":"Rajbhandari Samyam","year":"2019","unstructured":"Samyam Rajbhandari , Jeff Rasley , Olatunji Ruwase , and Yuxiong He . ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arXiv:1910.02054 , 2019 . Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. arXiv:1910.02054, 2019."},{"key":"e_1_3_2_1_5_1","volume-title":"https:\/\/www.deepspeed.ai\/news\/2020\/05\/27\/fastest-bert-training.html","author":"Microsoft","year":"2020","unstructured":"Microsoft DeepSpeed achieves the fastest BERT training time. https:\/\/www.deepspeed.ai\/news\/2020\/05\/27\/fastest-bert-training.html , 2020 . Microsoft DeepSpeed achieves the fastest BERT training time. https:\/\/www.deepspeed.ai\/news\/2020\/05\/27\/fastest-bert-training.html, 2020."},{"key":"e_1_3_2_1_6_1","volume-title":"NVIDIA Clocks World's Fastest BERT Training Time... https:\/\/devblogs.nvidia.com\/training-bert-with-gpus\/","author":"Narasimhan Shar","year":"2019","unstructured":"Shar Narasimhan . NVIDIA Clocks World's Fastest BERT Training Time... https:\/\/devblogs.nvidia.com\/training-bert-with-gpus\/ , 2019 . [Online; accessed 25-September-2019]. Shar Narasimhan. NVIDIA Clocks World's Fastest BERT Training Time... https:\/\/devblogs.nvidia.com\/training-bert-with-gpus\/, 2019. [Online; accessed 25-September-2019]."}],"event":{"name":"KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Virtual Event CA USA","acronym":"KDD '20","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394486.3406703","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394486.3406703","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:30Z","timestamp":1750195890000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394486.3406703"}},"subtitle":["System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters"],"short-title":[],"issued":{"date-parts":[[2020,8,20]]},"references-count":6,"alternative-id":["10.1145\/3394486.3406703","10.1145\/3394486"],"URL":"https:\/\/doi.org\/10.1145\/3394486.3406703","relation":{},"subject":[],"published":{"date-parts":[[2020,8,20]]},"assertion":[{"value":"2020-08-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}