{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T15:48:50Z","timestamp":1778860130779,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":57,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,27]],"date-time":"2019-10-27T00:00:00Z","timestamp":1572134400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,27]]},"DOI":"10.1145\/3341301.3359646","type":"proceedings-article","created":{"date-parts":[[2019,10,21]],"date-time":"2019-10-21T13:34:22Z","timestamp":1571664862000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":663,"title":["PipeDream"],"prefix":"10.1145","author":[{"given":"Deepak","family":"Narayanan","sequence":"first","affiliation":[{"name":"Microsoft Research and Stanford University"}]},{"given":"Aaron","family":"Harlap","sequence":"additional","affiliation":[{"name":"Microsoft Research and Carnegie Mellon University"}]},{"given":"Amar","family":"Phanishayee","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]},{"given":"Vivek","family":"Seshadri","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]},{"given":"Nikhil R.","family":"Devanur","sequence":"additional","affiliation":[{"name":"Microsoft Research"}]},{"given":"Gregory R.","family":"Ganger","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University"}]},{"given":"Phillip B.","family":"Gibbons","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University"}]},{"given":"Matei","family":"Zaharia","sequence":"additional","affiliation":[{"name":"Stanford University"}]}],"member":"320","published-online":{"date-parts":[[2019,10,27]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2019. Gloo. https:\/\/github.com\/facebookincubator\/gloo.  2019. Gloo. https:\/\/github.com\/facebookincubator\/gloo."},{"key":"e_1_3_2_1_2_1","unstructured":"2019. MLPerf. https:\/\/www.mlperf.org\/.  2019. MLPerf. https:\/\/www.mlperf.org\/."},{"key":"e_1_3_2_1_3_1","unstructured":"2019. NCCL. https:\/\/developer.nvidia.com\/nccl.  2019. NCCL. https:\/\/developer.nvidia.com\/nccl."},{"key":"e_1_3_2_1_4_1","unstructured":"2019. NVLink. https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink\/.  2019. NVLink. https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink\/."},{"key":"e_1_3_2_1_5_1","unstructured":"2019. PyTorch. https:\/\/github.com\/pytorch\/pytorch.  2019. PyTorch. https:\/\/github.com\/pytorch\/pytorch."},{"key":"e_1_3_2_1_6_1","unstructured":"2019. PyTorch DDP. https:\/\/pytorch.org\/docs\/stable\/_modules\/torch\/nn\/parallel\/distributed.html.  2019. PyTorch DDP. https:\/\/pytorch.org\/docs\/stable\/_modules\/torch\/nn\/parallel\/distributed.html."},{"key":"e_1_3_2_1_7_1","unstructured":"2019. VGG-16 target accuracy using Caffe model. https:\/\/gist.github.com\/ksimonyan\/211839e770f7b538e2d8#gistcomment-1403727.  2019. VGG-16 target accuracy using Caffe model. https:\/\/gist.github.com\/ksimonyan\/211839e770f7b538e2d8#gistcomment-1403727."},{"key":"e_1_3_2_1_8_1","volume-title":"TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . 2016 . TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) . GA, 265--283. https:\/\/www.tensorflow.org\/ Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). GA, 265--283. https:\/\/www.tensorflow.org\/"},{"key":"e_1_3_2_1_9_1","unstructured":"Baidu Inc. 2017. Bringing HPC Techniques to Deep Learning. http:\/\/research.baidu.com\/bringing-hpc-techniques-deep-learning\/  Baidu Inc. 2017. Bringing HPC Techniques to Deep Learning. http:\/\/research.baidu.com\/bringing-hpc-techniques-deep-learning\/"},{"key":"e_1_3_2_1_10_1","unstructured":"L\u00e9on Bottou and Olivier Bousquet. 2008. The Tradeoffs of Large Scale Learning. In Advances in Neural Information Processing Systems. 161--168.  L\u00e9on Bottou and Olivier Bousquet. 2008. The Tradeoffs of Large Scale Learning. In Advances in Neural Information Processing Systems. 161--168."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002497"},{"key":"e_1_3_2_1_12_1","volume-title":"Revisiting Distributed Synchronous SGD. arXiv preprint arXiv:1604.00981","author":"Chen Jianmin","year":"2016","unstructured":"Jianmin Chen , Xinghao Pan , Rajat Monga , Samy Bengio , and Rafal Jozefowicz . 2016. Revisiting Distributed Synchronous SGD. arXiv preprint arXiv:1604.00981 ( 2016 ). Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. 2016. Revisiting Distributed Synchronous SGD. arXiv preprint arXiv:1604.00981 (2016)."},{"key":"e_1_3_2_1_13_1","volume-title":"MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs\/1512.01274","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs\/1512.01274 ( 2015 ). http:\/\/arxiv.org\/abs\/1512.01274 Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs\/1512.01274 (2015). http:\/\/arxiv.org\/abs\/1512.01274"},{"key":"e_1_3_2_1_14_1","volume-title":"Training Deep Nets with Sublinear Memory Cost. arXiv preprint arXiv:1604.06174","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen , Bing Xu , Chiyuan Zhang , and Carlos Guestrin . 2016. Training Deep Nets with Sublinear Memory Cost. arXiv preprint arXiv:1604.06174 ( 2016 ). Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training Deep Nets with Sublinear Memory Cost. arXiv preprint arXiv:1604.06174 (2016)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Xie Chen Adam Eversole Gang Li Dong Yu and Frank Seide. 2012. Pipelined Back-Propagation for Context-dependent Deep Neural Networks. In Interspeech.  Xie Chen Adam Eversole Gang Li Dong Yu and Frank Seide. 2012. Pipelined Back-Propagation for Context-dependent Deep Neural Networks. In Interspeech.","DOI":"10.21437\/Interspeech.2012-7"},{"key":"e_1_3_2_1_16_1","volume-title":"11th USENIX Symposium on Operating Systems Design and Implementation (OSDI '14)","volume":"14","author":"Chilimbi Trishul M","year":"2014","unstructured":"Trishul M Chilimbi , Yutaka Suzue , Johnson Apacible , and Karthik Kalyanaraman . 2014 . Project Adam: Building an Efficient and Scalable Deep Learning Training System .. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI '14) , Vol. 14 . 571--582. Trishul M Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. 2014. Project Adam: Building an Efficient and Scalable Deep Learning Training System.. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI '14), Vol. 14. 571--582."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352020.3352024"},{"key":"e_1_3_2_1_18_1","volume-title":"DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop","author":"Coleman Cody","year":"2017","unstructured":"Cody Coleman , Deepak Narayanan , Daniel Kang , Tian Zhao , Jian Zhang , Luigi Nardi , Peter Bailis , Kunle Olukotun , Chris R\u00e9 , and Matei Zaharia . 2017 . DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017). Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris R\u00e9, and Matei Zaharia. 2017. DAWNBench: An End-to-End Deep Learning Benchmark and Competition. NIPS ML Systems Workshop (2017)."},{"key":"e_1_3_2_1_19_1","volume-title":"USENIX Annual Technical Conference. 37--48","author":"Cui Henggang","year":"2014","unstructured":"Henggang Cui , James Cipar , Qirong Ho , Jin Kyu Kim , Seunghak Lee , Abhimanu Kumar , Jinliang Wei , Wei Dai , Gregory R Ganger , Phillip B Gibbons , 2014 . Exploiting Bounded Staleness to Speed Up Big Data Analytics . In USENIX Annual Technical Conference. 37--48 . Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R Ganger, Phillip B Gibbons, et al. 2014. Exploiting Bounded Staleness to Speed Up Big Data Analytics. In USENIX Annual Technical Conference. 37--48."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2901318.2901323"},{"key":"e_1_3_2_1_21_1","unstructured":"Jeffrey Dean Greg Corrado Rajat Monga Kai Chen Matthieu Devin Mark Mao Andrew Senior Paul Tucker Ke Yang Quoc V Le etal 2012. Large Scale Distributed Deep Networks. In Advances in Neural Information Processing Systems. 1223--1231.  Jeffrey Dean Greg Corrado Rajat Monga Kai Chen Matthieu Devin Mark Mao Andrew Senior Paul Tucker Ke Yang Quoc V Le et al. 2012. Large Scale Distributed Deep Networks. In Advances in Neural Information Processing Systems. 1223--1231."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3348"},{"key":"e_1_3_2_1_23_1","unstructured":"DGX-1 [n. d.]. NVIDIA DGX-1. https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-1\/.  DGX-1 [n. d.]. NVIDIA DGX-1. https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-1\/."},{"key":"e_1_3_2_1_24_1","volume-title":"Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv preprint arXiv:1706.02677","author":"Goyal Priya","year":"2017","unstructured":"Priya Goyal , Piotr Doll\u00e1r , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , and Kaiming He. 2017. Accurate , Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv preprint arXiv:1706.02677 ( 2017 ). Priya Goyal, Piotr Doll\u00e1r, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv preprint arXiv:1706.02677 (2017)."},{"key":"e_1_3_2_1_25_1","volume-title":"PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv preprint arXiv:1806.03377","author":"Harlap Aaron","year":"2018","unstructured":"Aaron Harlap , Deepak Narayanan , Amar Phanishayee , Vivek Seshadri , Nikhil Devanur , Greg Ganger , and Phil Gibbons . 2018. PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv preprint arXiv:1806.03377 ( 2018 ). Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri, Nikhil Devanur, Greg Ganger, and Phil Gibbons. 2018. PipeDream: Fast and Efficient Pipeline Parallel DNN Training. arXiv preprint arXiv:1806.03377 (2018)."},{"key":"e_1_3_2_1_26_1","volume-title":"Deep Residual Learning for Image Recognition. CoRR abs\/1512.03385","author":"He Kaiming","year":"2015","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2015. Deep Residual Learning for Image Recognition. CoRR abs\/1512.03385 ( 2015 ). http:\/\/arxiv.org\/abs\/1512.03385 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs\/1512.03385 (2015). http:\/\/arxiv.org\/abs\/1512.03385"},{"key":"e_1_3_2_1_27_1","volume-title":"Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing.","author":"Ho Qirong","year":"2013","unstructured":"Qirong Ho , James Cipar , Henggang Cui , Seunghak Lee , Jin Kyu Kim , Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. 2013 . More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In Advances in Neural Information Processing Systems . 1223--1231. Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In Advances in Neural Information Processing Systems. 1223--1231."},{"key":"e_1_3_2_1_28_1","volume-title":"GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv preprint arXiv:1811.06965","author":"Huang Yanping","year":"2018","unstructured":"Yanping Huang , Yonglong Cheng , Dehao Chen , HyoukJoong Lee , Jiquan Ngiam , Quoc V Le , and Zhifeng Chen . 2018. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv preprint arXiv:1811.06965 ( 2018 ). Yanping Huang, Yonglong Cheng, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, and Zhifeng Chen. 2018. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. arXiv preprint arXiv:1811.06965 (2018)."},{"key":"e_1_3_2_1_29_1","volume-title":"Decoupled Parallel Backpropagation with Convergence Guarantee. ICML-18, arXiv preprint arXiv:1804.10574","author":"Huo Zhouyuan","year":"2018","unstructured":"Zhouyuan Huo , Bin Gu , Qian Yang , and Heng Huang . 2018. Decoupled Parallel Backpropagation with Convergence Guarantee. ICML-18, arXiv preprint arXiv:1804.10574 ( 2018 ). Zhouyuan Huo, Bin Gu, Qian Yang, and Heng Huang. 2018. Decoupled Parallel Backpropagation with Convergence Guarantee. ICML-18, arXiv preprint arXiv:1804.10574 (2018)."},{"key":"e_1_3_2_1_30_1","volume-title":"Gist: Efficient Data Encoding for Deep Neural Network Training. In ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA '18)","author":"Jain Animesh","year":"2018","unstructured":"Animesh Jain , Amar Phanishayee , Jason Mars , Lingjia Tang , and Gennady Pekhimenko . 2018 . Gist: Efficient Data Encoding for Deep Neural Network Training. In ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA '18) . Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, and Gennady Pekhimenko. 2018. Gist: Efficient Data Encoding for Deep Neural Network Training. In ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA '18)."},{"key":"e_1_3_2_1_31_1","unstructured":"Xianyan Jia Shutao Song Wei He Yangzihao Wang Haidong Rong Feihu Zhou Liqiang Xie Zhenyu Guo Yuanzhou Yang Liwei Yu etal 2018. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes. arXiv preprint arXiv:1807.11205 (2018).  Xianyan Jia Shutao Song Wei He Yangzihao Wang Haidong Rong Feihu Zhou Liqiang Xie Zhenyu Guo Yuanzhou Yang Liwei Yu et al. 2018. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes. arXiv preprint arXiv:1807.11205 (2018)."},{"key":"e_1_3_2_1_32_1","volume-title":"Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093","author":"Jia Yangqing","year":"2014","unstructured":"Yangqing Jia , Evan Shelhamer , Jeff Donahue , Sergey Karayev , Jonathan Long , Ross Girshick , Sergio Guadarrama , and Trevor Darrell . 2014 . Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014). Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014)."},{"key":"e_1_3_2_1_33_1","volume-title":"Proceedings of the 28th International Conference on Machine Learning (ICML '18)","author":"Jia Zhihao","year":"2018","unstructured":"Zhihao Jia , Sina Lin , Charles R Qi , and Alex Aiken . 2018 . Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks . In Proceedings of the 28th International Conference on Machine Learning (ICML '18) . Zhihao Jia, Sina Lin, Charles R Qi, and Alex Aiken. 2018. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. In Proceedings of the 28th International Conference on Machine Learning (ICML '18)."},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of the 2nd SysML Conference, SysML '19","author":"Jia Zhihao","year":"2019","unstructured":"Zhihao Jia , Matei Zaharia , and Alex Aiken . 2019 . Beyond Data and Model Parallelism for Deep Neural Networks . In Proceedings of the 2nd SysML Conference, SysML '19 . Palo Alto, CA, USA. Zhihao Jia, Matei Zaharia, and Alex Aiken. 2019. Beyond Data and Model Parallelism for Deep Neural Networks. In Proceedings of the 2nd SysML Conference, SysML '19. Palo Alto, CA, USA."},{"key":"e_1_3_2_1_35_1","volume-title":"Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik","year":"2014","unstructured":"Diederik Kingma and Jimmy Ba . 2014 . Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_36_1","volume-title":"One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997","author":"Krizhevsky Alex","year":"2014","unstructured":"Alex Krizhevsky . 2014. One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997 ( 2014 ). Alex Krizhevsky. 2014. One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997 (2014)."},{"key":"e_1_3_2_1_37_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems. 1097--1105.  Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems. 1097--1105."},{"key":"e_1_3_2_1_38_1","volume-title":"Scaling Distributed Machine Learning with the Parameter Server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14)","volume":"1","author":"Li Mu","year":"2014","unstructured":"Mu Li , David G Andersen , Jun Woo Park , Alexander J Smola , Amr Ahmed , Vanja Josifovski , James Long , Eugene J Shekita , and Bor-Yiing Su . 2014 . Scaling Distributed Machine Learning with the Parameter Server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14) , Vol. 1 . 3. Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI'14), Vol. 1. 3."},{"key":"e_1_3_2_1_39_1","volume-title":"Revisiting Small Batch Training for Deep Neural Networks. arXiv preprint arXiv:1804.07612","author":"Masters Dominic","year":"2018","unstructured":"Dominic Masters and Carlo Luschi . 2018. Revisiting Small Batch Training for Deep Neural Networks. arXiv preprint arXiv:1804.07612 ( 2018 ). Dominic Masters and Carlo Luschi. 2018. Revisiting Small Batch Training for Deep Neural Networks. arXiv preprint arXiv:1804.07612 (2018)."},{"key":"e_1_3_2_1_40_1","volume-title":"Nitish Shirish Keskar, and Richard Socher","author":"Merity Stephen","year":"2017","unstructured":"Stephen Merity , Nitish Shirish Keskar, and Richard Socher . 2017 . Regularizing and Optimizing LSTM Language Models . arXiv preprint arXiv:1708.02182 (2017). Stephen Merity, Nitish Shirish Keskar, and Richard Socher. 2017. Regularizing and Optimizing LSTM Language Models. arXiv preprint arXiv:1708.02182 (2017)."},{"key":"e_1_3_2_1_41_1","volume-title":"Recurrent Neural Network Based Language Model. In Eleventh Annual Conference of the International Speech Communication Association.","author":"Mikolov Tom\u00e1\u0161","year":"2010","unstructured":"Tom\u00e1\u0161 Mikolov , Martin Karafi\u00e3t , Luk\u00e1\u0161 Burget , Jan \u010cernock\u1ef3 , and Sanjeev Khudanpur . 2010 . Recurrent Neural Network Based Language Model. In Eleventh Annual Conference of the International Speech Communication Association. Tom\u00e1\u0161 Mikolov, Martin Karafi\u00e3t, Luk\u00e1\u0161 Burget, Jan \u010cernock\u1ef3, and Sanjeev Khudanpur. 2010. Recurrent Neural Network Based Language Model. In Eleventh Annual Conference of the International Speech Communication Association."},{"key":"e_1_3_2_1_42_1","unstructured":"Azalia Mirhoseini Hieu Pham Quoc Le Mohammad Norouzi Samy Bengio Benoit Steiner Yuefeng Zhou Naveen Kumar Rasmus Larsen and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. https:\/\/arxiv.org\/abs\/1706.04972  Azalia Mirhoseini Hieu Pham Quoc Le Mohammad Norouzi Samy Bengio Benoit Steiner Yuefeng Zhou Naveen Kumar Rasmus Larsen and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. https:\/\/arxiv.org\/abs\/1706.04972"},{"key":"e_1_3_2_1_43_1","unstructured":"Benjamin Recht Christopher Re Stephen Wright and Feng Niu. 2011. HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Advances in Neural Information Processing Systems. 693--701.  Benjamin Recht Christopher Re Stephen Wright and Feng Niu. 2011. HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In Advances in Neural Information Processing Systems. 693--701."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2945397"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2014-274"},{"key":"e_1_3_2_1_47_1","volume-title":"On Parallelizability of Stochastic Gradient Descent for Speech DNNs. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE SPS.","author":"Seide Frank","year":"2014","unstructured":"Frank Seide , Hao Fu , Jasha Droppo , Gang Li , and Dong Yu . 2014 . On Parallelizability of Stochastic Gradient Descent for Speech DNNs. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE SPS. Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. On Parallelizability of Stochastic Gradient Descent for Speech DNNs. In International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE SPS."},{"key":"e_1_3_2_1_48_1","volume-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2806777.2806945"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342005051521"},{"key":"e_1_3_2_1_51_1","volume-title":"Meet Horovod: Uber's Open Source Distributed Deep Learning Framework for TensorFlow. https:\/\/eng.uber.com\/horovod\/","author":"Uber Technologies Inc.","year":"2017","unstructured":"Uber Technologies Inc. 2017 . Meet Horovod: Uber's Open Source Distributed Deep Learning Framework for TensorFlow. https:\/\/eng.uber.com\/horovod\/ Uber Technologies Inc. 2017. Meet Horovod: Uber's Open Source Distributed Deep Learning Framework for TensorFlow. https:\/\/eng.uber.com\/horovod\/"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/79173.79181"},{"key":"e_1_3_2_1_53_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is All You Need. In Advances in Neural Information Processing Systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is All You Need. In Advances in Neural Information Processing Systems. 5998--6008."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_2_1_55_1","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey etal 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144 (2016).  Yonghui Wu Mike Schuster Zhifeng Chen Quoc V Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey et al. 2016. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv preprint arXiv:1609.08144 (2016)."},{"key":"e_1_3_2_1_56_1","volume-title":"Large Batch Training of Convolutional Networks. arXiv preprint arXiv:1708.03888","author":"You Yang","year":"2017","unstructured":"Yang You , Igor Gitman , and Boris Ginsburg . 2017. Large Batch Training of Convolutional Networks. arXiv preprint arXiv:1708.03888 ( 2017 ). Yang You, Igor Gitman, and Boris Ginsburg. 2017. Large Batch Training of Convolutional Networks. arXiv preprint arXiv:1708.03888 (2017)."},{"key":"e_1_3_2_1_57_1","volume-title":"Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters. In 2017 USENIX Annual Technical Conference (USENIX ATC 17)","author":"Zhang Hao","unstructured":"Hao Zhang , Zeyu Zheng , Shizhen Xu , Wei Dai , Qirong Ho , Xiaodan Liang , Zhiting Hu , Jinliang Wei , Pengtao Xie , and Eric P. Xing . 2017 . Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters. In 2017 USENIX Annual Technical Conference (USENIX ATC 17) . USENIX Association, Santa Clara, CA, 181--193. Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, and Eric P. Xing. 2017. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA, 181--193."}],"event":{"name":"SOSP '19: ACM SIGOPS 27th Symposium on Operating Systems Principles","location":"Huntsville Ontario Canada","acronym":"SOSP '19","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems","USENIX Assoc USENIX Assoc"]},"container-title":["Proceedings of the 27th ACM Symposium on Operating Systems Principles"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3341301.3359646","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3341301.3359646","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:12:56Z","timestamp":1750201976000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3341301.3359646"}},"subtitle":["generalized pipeline parallelism for DNN training"],"short-title":[],"issued":{"date-parts":[[2019,10,27]]},"references-count":57,"alternative-id":["10.1145\/3341301.3359646","10.1145\/3341301"],"URL":"https:\/\/doi.org\/10.1145\/3341301.3359646","relation":{},"subject":[],"published":{"date-parts":[[2019,10,27]]},"assertion":[{"value":"2019-10-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}