{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,25]],"date-time":"2026-06-25T22:13:07Z","timestamp":1782425587886,"version":"3.54.5"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2019,7,25]],"date-time":"2019-07-25T00:00:00Z","timestamp":1564012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGOPS Oper. Syst. Rev."],"published-print":{"date-parts":[[2019,7,25]]},"abstract":"<jats:p>Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model's accuracy on unseen data. Due to a lack of standard evaluation criteria that considers these trade-offs, it is difficult to directly compare these optimizations. To address this problem, we recently introduced DAWNBENCH, a benchmark competition focused on end-to-end training time to achieve near-state-of-the-art accuracy on an unseen dataset-a combined metric called time-to-accuracy (TTA). In this work, we analyze the entries from DAWNBENCH, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries. We show that TTA has a low coefficient of variation and that models optimized for TTA generalize nearly as well as those trained using standard methods. Additionally, even though DAWNBENCH entries were able to train ImageNet models in under 3 minutes, we find they still underutilize hardware capabilities such as Tensor Cores. Furthermore, we find that distributed entries can spend more than half of their time on communication. We show similar findings with entries to the MLPERF v0.5 benchmark.<\/jats:p>","DOI":"10.1145\/3352020.3352024","type":"journal-article","created":{"date-parts":[[2019,7,26]],"date-time":"2019-07-26T13:17:18Z","timestamp":1564147038000},"page":"14-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":73,"title":["Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark"],"prefix":"10.1145","volume":"53","author":[{"given":"Cody","family":"Coleman","sequence":"first","affiliation":[{"name":"Stanford DAWN, Stanford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Daniel","family":"Kang","sequence":"additional","affiliation":[{"name":"Stanford DAWN, Standford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Deepak","family":"Narayanan","sequence":"additional","affiliation":[{"name":"Stanford DAWN, Standford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Luigi","family":"Nardi","sequence":"additional","affiliation":[{"name":"Stanford DAWN, Stanford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tian","family":"Zhao","sequence":"additional","affiliation":[{"name":"Stanford DAWN, Stanford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jian","family":"Zhang","sequence":"additional","affiliation":[{"name":"Stanford DAWN, Stanford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peter","family":"Bailis","sequence":"additional","affiliation":[{"name":"Stanford DAWN, Stanford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kunle","family":"Olukotun","sequence":"additional","affiliation":[{"name":"Stanford DAWN, Stanford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chris","family":"R\u00e9","sequence":"additional","affiliation":[{"name":"Stanford DAWN, Stanford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Matei","family":"Zaharia","sequence":"additional","affiliation":[{"name":"Stanford DAWN, Stanford, CA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2019,7,25]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Second conference on machine translation 2017.  Second conference on machine translation 2017."},{"key":"e_1_2_1_2_1","volume-title":"https:\/\/www.tensorflow.org\/ performance\/xla","author":"Tensorflow","year":"2017","unstructured":"Tensorflow xla overview. https:\/\/www.tensorflow.org\/ performance\/xla , 2017 . Tensorflow xla overview. https:\/\/www.tensorflow.org\/ performance\/xla, 2017."},{"key":"e_1_2_1_3_1","volume-title":"https:\/\/mlperf.org\/","year":"2018","unstructured":"MLPerf. https:\/\/mlperf.org\/ , 2018 . MLPerf. https:\/\/mlperf.org\/, 2018."},{"key":"e_1_2_1_4_1","volume-title":"OSDI","author":"TVM","year":"2018","unstructured":"TVM : An automated end-to-end optimizing compiler for deep learning . In OSDI , Carlsbad, CA , 2018 . USENIX Association. TVM: An automated end-to-end optimizing compiler for deep learning. In OSDI, Carlsbad, CA, 2018. USENIX Association."},{"key":"e_1_2_1_5_1","first-page":"265","volume-title":"OSDI","volume":"16","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , : A System for Large-Scale Machine Learning . In OSDI , volume 16 , pages 265 -- 283 , 2016 . Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. TensorFlow: A System for Large-Scale Machine Learning. In OSDI, volume 16, pages 265--283, 2016."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2016.7581275"},{"key":"e_1_2_1_7_1","volume-title":"Extremely large minibatch sgd: Training resnet-50 on imagenet in 15 minutes. arXiv preprint arXiv:1711.04325","author":"Akiba Takuya","year":"2017","unstructured":"Takuya Akiba , Shuji Suzuki , and Keisuke Fukuda . Extremely large minibatch sgd: Training resnet-50 on imagenet in 15 minutes. arXiv preprint arXiv:1711.04325 , 2017 . Takuya Akiba, Shuji Suzuki, and Keisuke Fukuda. Extremely large minibatch sgd: Training resnet-50 on imagenet in 15 minutes. arXiv preprint arXiv:1711.04325, 2017."},{"key":"e_1_2_1_8_1","volume-title":"Ai and compute","author":"Amodei Dario","year":"2018","unstructured":"Dario Amodei and Danny Hernandez . Ai and compute , 2018 . Dario Amodei and Danny Hernandez. Ai and compute, 2018."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2318857.2254766"},{"key":"e_1_2_1_10_1","volume-title":"Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853","author":"Baevski Alexei","year":"2018","unstructured":"Alexei Baevski and Michael Auli . Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853 , 2018 . Alexei Baevski and Michael Auli. Adaptive input representations for neural language modeling. arXiv preprint arXiv:1809.10853, 2018."},{"key":"e_1_2_1_11_1","volume-title":"Comparative Study of Deep Learning Software Frameworks. arXiv preprint arXiv:1511.06435","author":"Bahrampour Soheil","year":"2015","unstructured":"Soheil Bahrampour , Naveen Ramakrishnan , Lukas Schott , and Mohak Shah . Comparative Study of Deep Learning Software Frameworks. arXiv preprint arXiv:1511.06435 , 2015 . Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, and Mohak Shah. Comparative Study of Deep Learning Software Frameworks. arXiv preprint arXiv:1511.06435, 2015."},{"key":"e_1_2_1_12_1","volume-title":"DeepBench: Benchmarking Deep Learning Operations on Different Hardware. https:\/\/github.com\/baidu-research\/ DeepBench","year":"2017","unstructured":"Baidu. DeepBench: Benchmarking Deep Learning Operations on Different Hardware. https:\/\/github.com\/baidu-research\/ DeepBench , 2017 . Baidu. DeepBench: Benchmarking Deep Learning Operations on Different Hardware. https:\/\/github.com\/baidu-research\/ DeepBench, 2017."},{"key":"e_1_2_1_13_1","volume-title":"What is underfitting and overfitting in machine learning and how to deal with it","author":"Bhande Anup","year":"2018","unstructured":"Anup Bhande . What is underfitting and overfitting in machine learning and how to deal with it , 2018 . Anup Bhande. What is underfitting and overfitting in machine learning and how to deal with it, 2018."},{"key":"e_1_2_1_14_1","volume-title":"Making ncf reflect production usage","author":"Bittorf Victor","year":"2019","unstructured":"Victor Bittorf . Making ncf reflect production usage , 2019 . Victor Bittorf. Making ncf reflect production usage, 2019."},{"key":"e_1_2_1_15_1","first-page":"22","article-title":"Microsoft unveils Project Brainwave for Real-time AI. Microsoft Research","author":"Burger Doug","year":"2017","unstructured":"Doug Burger . Microsoft unveils Project Brainwave for Real-time AI. Microsoft Research , Microsoft , 22 , 2017 . Doug Burger. Microsoft unveils Project Brainwave for Real-time AI. Microsoft Research, Microsoft, 22, 2017.","journal-title":"Microsoft"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3084447"},{"key":"e_1_2_1_17_1","volume-title":"One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005","author":"Chelba Ciprian","year":"2013","unstructured":"Ciprian Chelba , Tomas Mikolov , Mike Schuster , Qi Ge , Thorsten Brants , Phillipp Koehn , and Tony Robinson . One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005 , 2013 . Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005, 2013."},{"key":"e_1_2_1_18_1","volume-title":"Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274 , 2015 . Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274, 2015."},{"key":"e_1_2_1_19_1","volume-title":"cuDNN: Efficient Primitives for Deep Learning. arXiv preprint arXiv:1410.0759","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur , CliffWoolley, Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , and Evan Shelhamer . cuDNN: Efficient Primitives for Deep Learning. arXiv preprint arXiv:1410.0759 , 2014 . Sharan Chetlur, CliffWoolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cuDNN: Efficient Primitives for Deep Learning. arXiv preprint arXiv:1410.0759, 2014."},{"key":"e_1_2_1_20_1","first-page":"571","volume-title":"OSDI","volume":"14","author":"Chilimbi Trishul M","year":"2014","unstructured":"Trishul M Chilimbi , Yutaka Suzue , Johnson Apacible , and Karthik Kalyanaraman . Project Adam : Building an Efficient and Scalable Deep Learning Training System . In OSDI , volume 14 , pages 571 -- 582 , 2014 . Trishul M Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. Project Adam: Building an Efficient and Scalable Deep Learning Training System. In OSDI, volume 14, pages 571--582, 2014."},{"key":"e_1_2_1_21_1","volume-title":"September","author":"Chintala Soumith","year":"2017","unstructured":"Soumith Chintala . Convnet-Benchmarks: Easy Benchmarking of All Publicly Accessible Implementations of Convnets. https:\/\/github. com\/soumith\/convnet-benchmarks , September 2017 . Soumith Chintala. Convnet-Benchmarks: Easy Benchmarking of All Publicly Accessible Implementations of Convnets. https:\/\/github. com\/soumith\/convnet-benchmarks, September 2017."},{"key":"e_1_2_1_22_1","volume-title":"Matei Zaharia. DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop","author":"Coleman Cody","year":"2017","unstructured":"Cody Coleman , Deepak Narayanan , Daniel Kang , Tian Zhao , Jian Zhang , Luigi Nardi , Peter Bailis , Kunle Olukotun , Chris R\u00e9 , and Matei Zaharia. DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop , 2017 . Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris R\u00e9, and Matei Zaharia. DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop, 2017."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080248"},{"key":"e_1_2_1_24_1","volume-title":"Highaccuracy low-precision training. arXiv preprint arXiv:1803.03383","author":"Sa Christopher De","year":"2018","unstructured":"Christopher De Sa , Megan Leszczynski , Jian Zhang , Alana Marzoev , Christopher R Aberger , Kunle Olukotun , and Christopher R\u00e9 . Highaccuracy low-precision training. arXiv preprint arXiv:1803.03383 , 2018 . Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R Aberger, Kunle Olukotun, and Christopher R\u00e9. Highaccuracy low-precision training. arXiv preprint arXiv:1803.03383, 2018."},{"key":"e_1_2_1_25_1","volume-title":"NIPS","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean , Greg Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Mark Mao , Andrew Senior , Paul Tucker , Ke Yang , Quoc V Le , Large Scale Distributed Deep Networks . In NIPS , 2012 . Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large Scale Distributed Deep Networks. In NIPS, 2012."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3224419"},{"key":"e_1_2_1_28_1","first-page":"315","volume-title":"AISTATS","author":"Glorot Xavier","year":"2011","unstructured":"Xavier Glorot , Antoine Bordes , and Yoshua Bengio . Deep Sparse Rectifier Neural Networks . In AISTATS , pages 315 -- 323 , 2011 . Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep Sparse Rectifier Neural Networks. In AISTATS, pages 315--323, 2011."},{"key":"e_1_2_1_29_1","volume-title":"Deep learning","author":"Goodfellow Ian","year":"2016","unstructured":"Ian Goodfellow , Yoshua Bengio , and Aaron Courville . Deep learning . MIT press , 2016 . Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016."},{"key":"e_1_2_1_30_1","volume-title":"https:\/\/www.tensorflow.org\/ performance\/benchmarks","author":"Benchmarks TensorFlow","year":"2017","unstructured":"Google. TensorFlow Benchmarks . https:\/\/www.tensorflow.org\/ performance\/benchmarks , 2017 . Google. TensorFlow Benchmarks. https:\/\/www.tensorflow.org\/ performance\/benchmarks, 2017."},{"key":"e_1_2_1_31_1","volume-title":"Large Minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677","author":"Goyal Priya","year":"2017","unstructured":"Priya Goyal , Piotr Doll\u00e1r , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , and Kaiming He. Accurate , Large Minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677 , 2017 . Priya Goyal, Piotr Doll\u00e1r, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, Large Minibatch SGD: Training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677, 2017."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987554"},{"key":"e_1_2_1_33_1","volume-title":"SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size. arXiv preprint arXiv:1602.07360","author":"Iandola Forrest N","year":"2016","unstructured":"Forrest N Iandola , Song Han , Matthew W Moskewicz , Khalid Ashraf , William J Dally , and Kurt Keutzer . SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size. arXiv preprint arXiv:1602.07360 , 2016 . Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size. arXiv preprint arXiv:1602.07360, 2016."},{"key":"e_1_2_1_34_1","volume-title":"Bigdl: Distributed deep learning library for apache spark","year":"2019","unstructured":"Intel. Bigdl: Distributed deep learning library for apache spark , 2019 . Intel. Bigdl: Distributed deep learning library for apache spark, 2019."},{"key":"e_1_2_1_35_1","volume-title":"Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 , 2015 . Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_1_37_1","volume-title":"SysML","author":"Jia Zhihao","year":"2019","unstructured":"Zhihao Jia , Matei Zaharia , and Alex Aiken . Beyond data and model parallelism for deep neural networks . In SysML , 2019 . Zhihao Jia, Matei Zaharia, and Alex Aiken. Beyond data and model parallelism for deep neural networks. In SysML, 2019."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_2_1_39_1","volume-title":"Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410","author":"Jozefowicz Rafal","year":"2016","unstructured":"Rafal Jozefowicz , Oriol Vinyals , Mike Schuster , Noam Shazeer , and YonghuiWu. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410 , 2016 . Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and YonghuiWu. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016."},{"key":"e_1_2_1_40_1","volume-title":"Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv preprint arXiv:1710.10196","author":"Karras Tero","year":"2017","unstructured":"Tero Karras , Timo Aila , Samuli Laine , and Jaakko Lehtinen . Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv preprint arXiv:1710.10196 , 2017 . Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv preprint arXiv:1710.10196, 2017."},{"key":"e_1_2_1_41_1","volume-title":"ICLR","author":"Kingma Diederik P","year":"2015","unstructured":"Diederik P Kingma and Jimmy Ba. Adam : A Method for Stochastic Optimization . ICLR , 2015 . Diederik P Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. ICLR, 2015."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685095"},{"key":"e_1_2_1_43_1","first-page":"3","volume-title":"Kyoung Mu Lee. Enhanced Deep Residual Networks for Single Image Super- Resolution. In CVPR Workshops","volume":"1","author":"Lim Bee","year":"2017","unstructured":"Bee Lim , Sanghyun Son , Heewon Kim , Seungjun Nah , and Kyoung Mu Lee. Enhanced Deep Residual Networks for Single Image Super- Resolution. In CVPR Workshops , volume 1 , page 3 , 2017 . Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced Deep Residual Networks for Single Image Super- Resolution. In CVPR Workshops, volume 1, page 3, 2017."},{"key":"e_1_2_1_44_1","volume-title":"ICLR","author":"Lin Yujun","year":"2018","unstructured":"Yujun Lin , Song Han , Huizi Mao , Yu Wang , and Bill Dally . Deep gradient compression: Reducing the communication bandwidth for distributed training . In ICLR , 2018 . Yujun Lin, Song Han, Huizi Mao, Yu Wang, and Bill Dally. Deep gradient compression: Reducing the communication bandwidth for distributed training. In ICLR, 2018."},{"key":"e_1_2_1_45_1","volume-title":"Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. Nvidia tensor core programmability, performance & precision. arXiv preprint arXiv:1803.04014","author":"Markidis Stefano","year":"2018","unstructured":"Stefano Markidis , Steven Wei Der Chien , Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. Nvidia tensor core programmability, performance & precision. arXiv preprint arXiv:1803.04014 , 2018 . Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, and Jeffrey S Vetter. Nvidia tensor core programmability, performance & precision. arXiv preprint arXiv:1803.04014, 2018."},{"key":"e_1_2_1_46_1","volume-title":"Revisiting Small Batch Training for Deep Neural Networks. arXiv preprint arXiv:1804.07612","author":"Masters Dominic","year":"2018","unstructured":"Dominic Masters and Carlo Luschi . Revisiting Small Batch Training for Deep Neural Networks. arXiv preprint arXiv:1804.07612 , 2018 . Dominic Masters and Carlo Luschi. Revisiting Small Batch Training for Deep Neural Networks. arXiv preprint arXiv:1804.07612, 2018."},{"key":"e_1_2_1_47_1","volume-title":"An empirical model of large-batch training. arXiv preprint arXiv:1812.06162","author":"McCandlish Sam","year":"2018","unstructured":"Sam McCandlish , Jared Kaplan , Dario Amodei , and Open AI Dota Team . An empirical model of large-batch training. arXiv preprint arXiv:1812.06162 , 2018 . Sam McCandlish, Jared Kaplan, Dario Amodei, and OpenAI Dota Team. An empirical model of large-batch training. arXiv preprint arXiv:1812.06162, 2018."},{"key":"e_1_2_1_48_1","volume-title":"Mixed Precision Training. arXiv preprint arXiv:1710.03740","author":"Micikevicius Paulius","year":"2017","unstructured":"Paulius Micikevicius , Sharan Narang , Jonah Alben , Gregory Diamos , Erich Elsen , David Garcia , Boris Ginsburg , Michael Houston , Oleksii Kuchaev , Ganesh Venkatesh , Mixed Precision Training. arXiv preprint arXiv:1710.03740 , 2017 . Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaev, Ganesh Venkatesh, et al. Mixed Precision Training. arXiv preprint arXiv:1710.03740, 2017."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/ALLERTON.2016.7852343"},{"key":"e_1_2_1_50_1","first-page":"807","volume-title":"ICML","author":"Nair Vinod","year":"2010","unstructured":"Vinod Nair and Geoffrey E Hinton . Rectified linear units improve restricted boltzmann machines . In ICML , pages 807 -- 814 , 2010 . Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807--814, 2010."},{"key":"e_1_2_1_51_1","first-page":"693","volume-title":"NIPS","author":"Niu Feng","year":"2011","unstructured":"Feng Niu , Benjamin Recht , Christopher Re , and Stephen Wright . Hogwild : A Lock-free Approach to Parallelizing Stochastic Gradient Descent . In NIPS , pages 693 -- 701 , 2011 . Feng Niu, Benjamin Recht, Christopher Re, and Stephen Wright. Hogwild: A Lock-free Approach to Parallelizing Stochastic Gradient Descent. In NIPS, pages 693--701, 2011."},{"key":"e_1_2_1_52_1","volume-title":"Automatic differentiation in pytorch","author":"Paszke Adam","year":"2017","unstructured":"Adam Paszke , Sam Gross , Soumith Chintala , Gregory Chanan , Edward Yang , Zachary DeVito , Zeming Lin , Alban Desmaison , Luca Antiga , and Adam Lerer . Automatic differentiation in pytorch . 2017 . Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017."},{"key":"e_1_2_1_53_1","volume-title":"Low-Power Robotics Applications.","author":"Pena Dexmont","year":"2017","unstructured":"Dexmont Pena , Andrew Forembski , Xiaofan Xu , and David Moloney . Benchmarking of CNNs for Low-Cost , Low-Power Robotics Applications. 2017 . Dexmont Pena, Andrew Forembski, Xiaofan Xu, and David Moloney. Benchmarking of CNNs for Low-Cost, Low-Power Robotics Applications. 2017."},{"key":"e_1_2_1_54_1","volume-title":"Regularized Evolution for Image Classifier Architecture Search. arXiv preprint arXiv:1802.01548","author":"Real Esteban","year":"2018","unstructured":"Esteban Real , Alok Aggarwal , Yanping Huang , and Quoc V Le . Regularized Evolution for Image Classifier Architecture Search. arXiv preprint arXiv:1802.01548 , 2018 . Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized Evolution for Image Classifier Architecture Search. arXiv preprint arXiv:1802.01548, 2018."},{"key":"e_1_2_1_55_1","volume-title":"Do CIFAR-10 classifiers generalize to cifar-10? CoRR, abs\/1806.00451","author":"Recht Benjamin","year":"2018","unstructured":"Benjamin Recht , Rebecca Roelofs , Ludwig Schmidt , and Vaishaal Shankar . Do CIFAR-10 classifiers generalize to cifar-10? CoRR, abs\/1806.00451 , 2018 . Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do CIFAR-10 classifiers generalize to cifar-10? CoRR, abs\/1806.00451, 2018."},{"key":"e_1_2_1_56_1","first-page":"400","volume-title":"A stochastic approximation method. The annals of mathematical statistics","author":"Robbins Herbert","year":"1951","unstructured":"Herbert Robbins and Sutton Monro . A stochastic approximation method. The annals of mathematical statistics , pages 400 -- 407 , 1951 . Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400--407, 1951."},{"key":"e_1_2_1_57_1","volume-title":"Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike Del Balso . Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799 , 2018 . Alexander Sergeev and Mike Del Balso. Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799, 2018."},{"key":"e_1_2_1_58_1","volume-title":"Cloud Computing and Big Data (CCBD)","author":"Shi Shaohuai","year":"2016","unstructured":"Shaohuai Shi , Qiang Wang , Pengfei Xu , and Xiaowen Chu . Benchmarking State-of-the-Art Deep Learning Software Tools . In Cloud Computing and Big Data (CCBD) . IEEE , 2016 . Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. Benchmarking State-of-the-Art Deep Learning Software Tools. In Cloud Computing and Big Data (CCBD). IEEE, 2016."},{"key":"e_1_2_1_59_1","volume-title":"Don't decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489","author":"Smith Samuel L","year":"2017","unstructured":"Samuel L Smith , Pieter-Jan Kindermans , and Quoc V Le . Don't decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489 , 2017 . Samuel L Smith, Pieter-Jan Kindermans, and Quoc V Le. Don't decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489, 2017."},{"key":"e_1_2_1_60_1","first-page":"604","volume-title":"ICML","author":"Sohl-Dickstein Jascha","year":"2014","unstructured":"Jascha Sohl-Dickstein , Ben Poole , and Surya Ganguli . Fast Largescale Optimization by Unifying Stochastic Gradient and Quasi-Newton Methods . In ICML , pages 604 -- 612 , 2014 . Jascha Sohl-Dickstein, Ben Poole, and Surya Ganguli. Fast Largescale Optimization by Unifying Stochastic Gradient and Quasi-Newton Methods. In ICML, pages 604--612, 2014."},{"key":"e_1_2_1_61_1","volume-title":"Revisiting unreasonable effectiveness of data in deep learning era. CoRR, abs\/1707.02968","author":"Sun Chen","year":"2017","unstructured":"Chen Sun , Abhinav Shrivastava , Saurabh Singh , and Abhinav Gupta . Revisiting unreasonable effectiveness of data in deep learning era. CoRR, abs\/1707.02968 , 2017 . Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. Revisiting unreasonable effectiveness of data in deep learning era. CoRR, abs\/1707.02968, 2017."},{"key":"e_1_2_1_62_1","first-page":"1139","volume-title":"ICML","author":"Sutskever Ilya","year":"2013","unstructured":"Ilya Sutskever , James Martens , George Dahl , and Geoffrey Hinton . On the Importance of Initialization and Momentum in Deep Learning . In ICML , pages 1139 -- 1147 , 2013 . Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the Importance of Initialization and Momentum in Deep Learning. In ICML, pages 1139--1147, 2013."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_2_1_64_1","first-page":"4148","volume-title":"NIPS","author":"Wilson Ashia C","year":"2017","unstructured":"Ashia C Wilson , Rebecca Roelofs , Mitchell Stern , Nati Srebro , and Benjamin Recht . The marginal value of adaptive gradient methods in machine learning . In NIPS , pages 4148 -- 4158 , 2017 . Ashia C Wilson, Rebecca Roelofs, Mitchell Stern, Nati Srebro, and Benjamin Recht. The marginal value of adaptive gradient methods in machine learning. In NIPS, pages 4148--4158, 2017."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3225058.3225069"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2733001"},{"key":"e_1_2_1_67_1","volume-title":"Tbd: Benchmarking and analyzing deep neural network training. arXiv preprint arXiv:1803.06905","author":"Zhu Hongyu","year":"2018","unstructured":"Hongyu Zhu , Mohamed Akrout , Bojian Zheng , Andrew Pelegris , Amar Phanishayee , Bianca Schroeder , and Gennady Pekhimenko . Tbd: Benchmarking and analyzing deep neural network training. arXiv preprint arXiv:1803.06905 , 2018 . Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, and Gennady Pekhimenko. Tbd: Benchmarking and analyzing deep neural network training. arXiv preprint arXiv:1803.06905, 2018."}],"container-title":["ACM SIGOPS Operating Systems Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3352020.3352024","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3352020.3352024","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:26:15Z","timestamp":1750206375000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3352020.3352024"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,25]]},"references-count":67,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,7,25]]}},"alternative-id":["10.1145\/3352020.3352024"],"URL":"https:\/\/doi.org\/10.1145\/3352020.3352024","relation":{},"ISSN":["0163-5980"],"issn-type":[{"value":"0163-5980","type":"print"}],"subject":[],"published":{"date-parts":[[2019,7,25]]},"assertion":[{"value":"2019-07-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}