{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:32:33Z","timestamp":1750221153973,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":27,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,3,30]],"date-time":"2018-03-30T00:00:00Z","timestamp":1522368000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,3,30]]},"DOI":"10.1145\/3184407.3184424","type":"proceedings-article","created":{"date-parts":[[2018,4,4]],"date-time":"2018-04-04T12:25:45Z","timestamp":1522844745000},"page":"56-67","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Involving CPUs into Multi-GPU Deep Learning"],"prefix":"10.1145","author":[{"given":"Tung D.","family":"Le","sequence":"first","affiliation":[{"name":"IBM Research - Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Taro","family":"Sekiyama","sequence":"additional","affiliation":[{"name":"IBM Research - Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yasushi","family":"Negishi","sequence":"additional","affiliation":[{"name":"IBM Research - Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haruki","family":"Imai","sequence":"additional","affiliation":[{"name":"IBM Research - Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kiyokuni","family":"Kawachiya","sequence":"additional","affiliation":[{"name":"IBM Research - Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,3,30]]},"reference":[{"volume-title":"IBM Power System S822LC for High Performance Computing. (Oct","year":"2016","key":"e_1_3_2_1_1_1","unstructured":"2016. IBM Power System S822LC for High Performance Computing. (Oct . 2016 ). http:\/\/www-03.ibm.com\/systems\/power\/hardware\/s822lc-hpc\/. 2016. IBM Power System S822LC for High Performance Computing. (Oct. 2016). http:\/\/www-03.ibm.com\/systems\/power\/hardware\/s822lc-hpc\/."},{"key":"e_1_3_2_1_2_1","unstructured":"2016. Torch. (Oct. 2016). http:\/\/torch.ch\/.  2016. Torch. (Oct. 2016). http:\/\/torch.ch\/."},{"key":"e_1_3_2_1_3_1","volume-title":"https:\/\/developer.nvidia.com\/nccl","author":"NVIDIA","year":"2017","unstructured":"2017. NVIDIA NCCL. ( 2017 ). https:\/\/developer.nvidia.com\/nccl . 2017. NVIDIA NCCL. (2017). https:\/\/developer.nvidia.com\/nccl."},{"volume-title":"https:\/\/luna16.grand-challenge.org","year":"2017","key":"e_1_3_2_1_4_1","unstructured":"2017. Torch. ( 2017 ). https:\/\/luna16.grand-challenge.org . 2017. Torch. (2017). https:\/\/luna16.grand-challenge.org."},{"key":"e_1_3_2_1_5_1","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http:\/\/tensorflow.org\/ Software available from tensorflow.org.  Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Man\u00e9 Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Vi\u00e9gas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http:\/\/tensorflow.org\/ Software available from tensorflow.org."},{"key":"e_1_3_2_1_6_1","volume-title":"Mxnet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . 2015 . Mxnet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274 (2015). Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv preprint arXiv:1512.01274 (2015)."},{"key":"e_1_3_2_1_7_1","volume-title":"Deep Learning with COTS HPC Systems. In International Conference on Machine Learning","volume":"28","author":"Coates Adam","year":"2013","unstructured":"Adam Coates , Brody Huval , Tao Wang , David Wu , Bryan Catanzaro , and Ng Andrew . 2013 . Deep Learning with COTS HPC Systems. In International Conference on Machine Learning , Vol. 28 . JMLR Workshop and Conference Proceedings, 1337--1345. Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Ng Andrew. 2013. Deep Learning with COTS HPC Systems. In International Conference on Machine Learning, Vol. 28. JMLR Workshop and Conference Proceedings, 1337--1345."},{"key":"e_1_3_2_1_8_1","volume-title":"NIPS Workshop.","author":"Collobert Ronan","year":"2011","unstructured":"Ronan Collobert , Koray Kavukcuoglu , and Cl\u00e9ment Farabet . 2011 . Torch7: A Matlab-like Environment for Machine Learning. In BigLearn , NIPS Workshop. Ronan Collobert, Koray Kavukcuoglu, and Cl\u00e9ment Farabet. 2011. Torch7: A Matlab-like Environment for Machine Learning. In BigLearn, NIPS Workshop."},{"volume-title":"Large Scale Distributed Deep Networks. In International Conference on Neural Information Processing Systems. 1232--1240","author":"Dean Jeffrey","key":"e_1_3_2_1_9_1","unstructured":"Jeffrey Dean , Greg S. Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Quoc V. Le , Mark Z. Mao , Marc?f Aurelio Ranzato , Andrew Senior , Paul Tucker , Ke Yang , and Andrew Y. Ng . 2012 . Large Scale Distributed Deep Networks. In International Conference on Neural Information Processing Systems. 1232--1240 . Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc?fAurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In International Conference on Neural Information Processing Systems. 1232--1240."},{"volume-title":"Deep Learning","author":"Goodfellow Ian","key":"e_1_3_2_1_10_1","unstructured":"Ian Goodfellow , Yoshua Bengio , and Aaron Courville . 2016. Deep Learning . MIT Press . http:\/\/www.deeplearningbook.org. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http:\/\/www.deeplearningbook.org."},{"key":"e_1_3_2_1_11_1","volume-title":"Deep Residual Learning for Image Recognition. CoRR abs\/1512.03385","author":"He Kaiming","year":"2015","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2015. Deep Residual Learning for Image Recognition. CoRR abs\/1512.03385 ( 2015 ). http:\/\/arxiv.org\/ abs\/1512.03385 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs\/1512.03385 (2015). http:\/\/arxiv.org\/ abs\/1512.03385"},{"key":"e_1_3_2_1_12_1","volume-title":"Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR abs\/1502.01852","author":"He Kaiming","year":"2015","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR abs\/1502.01852 ( 2015 ). Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR abs\/1502.01852 (2015)."},{"volume-title":"Identity Mappings in Deep Residual Networks","author":"He Kaiming","key":"e_1_3_2_1_13_1","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2016. Identity Mappings in Deep Residual Networks . Springer International Publishing , 630--645. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. Springer International Publishing, 630--645."},{"key":"e_1_3_2_1_14_1","volume-title":"International Conference on Neural Information Processing Systems. 1223--1231","author":"Ho Qirong","year":"2013","unstructured":"Qirong Ho , James Cipar , Henggang Cui , Seunghak Lee , Jin Kyu Kim , Phillip B. Gibbons , Garth A Gibson , Greg Ganger , and Eric P Xing . 2013 . More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server . In International Conference on Neural Information Processing Systems. 1223--1231 . Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In International Conference on Neural Information Processing Systems. 1223--1231."},{"key":"e_1_3_2_1_15_1","volume-title":"Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093","author":"Jia Yangqing","year":"2014","unstructured":"Yangqing Jia , Evan Shelhamer , Jeff Donahue , Sergey Karayev , Jonathan Long , Ross Girshick , Sergio Guadarrama , and Trevor Darrell . 2014 . Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014). Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014)."},{"key":"e_1_3_2_1_16_1","volume-title":"One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997v2","author":"Krizhevsky Alex","year":"2014","unstructured":"Alex Krizhevsky . 2014. One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997v2 ( 2014 ). Alex Krizhevsky. 2014. One Weird Trick for Parallelizing Convolutional Neural Networks. arXiv preprint arXiv:1404.5997v2 (2014)."},{"volume-title":"ImageNet Classification with Deep Convolutional Neural Networks. In International Conference on Neural Information Processing Systems. 1097--1105","author":"Krizhevsky Alex","key":"e_1_3_2_1_17_1","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E. Hinton . 2012 . ImageNet Classification with Deep Convolutional Neural Networks. In International Conference on Neural Information Processing Systems. 1097--1105 . Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In International Conference on Neural Information Processing Systems. 1097--1105."},{"key":"e_1_3_2_1_18_1","volume-title":"Building High-Level Features Using Large Scale Unsupervised Learning. In International Conference in Machine Learning.","author":"Le Quoc","year":"2012","unstructured":"Quoc Le , Marc? Aurelio Ranzato , Rajat Monga , Matthieu Devin , Kai Chen , Greg Corrado , Jeff Dean , and Andrew Ng . 2012 . Building High-Level Features Using Large Scale Unsupervised Learning. In International Conference in Machine Learning. Quoc Le, Marc?Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg Corrado, Jeff Dean, and Andrew Ng. 2012. Building High-Level Features Using Large Scale Unsupervised Learning. In International Conference in Machine Learning."},{"key":"e_1_3_2_1_19_1","unstructured":"Azalia Mirhoseini Hieu Pham Quoc Le Mohammad Norouzi Samy Bengio Benoit Steiner Yuefeng Zhou Naveen Kumar Rasmus Larsen and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. https:\/\/arxiv.org\/abs\/1706.04972  Azalia Mirhoseini Hieu Pham Quoc Le Mohammad Norouzi Samy Bengio Benoit Steiner Yuefeng Zhou Naveen Kumar Rasmus Larsen and Jeff Dean. 2017. Device Placement Optimization with Reinforcement Learning. https:\/\/arxiv.org\/abs\/1706.04972"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_1_21_1","volume-title":"English Conversational Telephone Speech Recognition by Humans and Machines. CoRR abs\/1703.02136","author":"Saon George","year":"2017","unstructured":"George Saon , Gakuto Kurata , Tom Sercu , Kartik Audhkhasi , Samuel Thomas , Dimitrios Dimitriadis , Xiaodong Cui , Bhuvana Ramabhadran , Michael Picheny , Lynn-Li Lim , Bergul Roomi , and Phil Hall . 2017. English Conversational Telephone Speech Recognition by Humans and Machines. CoRR abs\/1703.02136 ( 2017 ). George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, and Phil Hall. 2017. English Conversational Telephone Speech Recognition by Humans and Machines. CoRR abs\/1703.02136 (2017)."},{"key":"e_1_3_2_1_22_1","volume-title":"Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"volume-title":"International Conference on Neural Information Processing Systems. 3104--3112","author":"Sutskever Ilya","key":"e_1_3_2_1_23_1","unstructured":"Ilya Sutskever , Oriol Vinyals , and Quoc V. Le . 2014. Sequence to Sequence Learning with Neural Networks . In International Conference on Neural Information Processing Systems. 3104--3112 . Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In International Conference on Neural Information Processing Systems. 3104--3112."},{"key":"e_1_3_2_1_24_1","volume-title":"Going Deeper with Convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 1--9.","author":"Szegedy Christian","year":"2015","unstructured":"Christian Szegedy , Wei Liu , Yangqing Jia , Pierre Sermanet , Scott E. Reed , Dragomir Anguelov , Dumitru Erhan , Vincent Vanhoucke , and Andrew Rabinovich . 2015 . Going Deeper with Convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 1--9. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 1--9."},{"key":"e_1_3_2_1_26_1","volume-title":"Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines. arXiv preprint arXiv:1512.06216","author":"Zhang Hao","year":"2015","unstructured":"Hao Zhang , Zhiting Hu , Jinliang Wei , Pengtao Xie , Gunhee Kim , Qirong Ho , and Eric Xing . 2015 . Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines. arXiv preprint arXiv:1512.06216 (2015). Hao Zhang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Gunhee Kim, Qirong Ho, and Eric Xing. 2015. Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines. arXiv preprint arXiv:1512.06216 (2015)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/3060832.3060950"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733082"}],"event":{"name":"ICPE '18: ACM\/SPEC International Conference on Performance Engineering","sponsor":["SIGMETRICS ACM Special Interest Group on Measurement and Evaluation","SIGSOFT ACM Special Interest Group on Software Engineering"],"location":"Berlin Germany","acronym":"ICPE '18"},"container-title":["Proceedings of the 2018 ACM\/SPEC International Conference on Performance Engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3184407.3184424","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3184407.3184424","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:08:30Z","timestamp":1750208910000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3184407.3184424"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,3,30]]},"references-count":27,"alternative-id":["10.1145\/3184407.3184424","10.1145\/3184407"],"URL":"https:\/\/doi.org\/10.1145\/3184407.3184424","relation":{},"subject":[],"published":{"date-parts":[[2018,3,30]]},"assertion":[{"value":"2018-03-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}